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INTRODUCTION 


T'uis issue more nearly resembles the parallel issue in the 1951 cycle than 
its predecessor of 1954. Whereas the latter was devoted to a review of 
statistical developments, the former attempted to review developments in 
research methodology generally. However, we have set ourselves an addi- 
tional task. Specifically this issue attempts to: 


1. Take note of the significant literature published in the intervening 
period since the last Review coverage of certain familiar areas of 
research methodology 

2. Introduce new topics of immediate or potential importance to those 
engaged in educational research work. 


This latter goal is exemplified in the issue’s first chapter. This discussion 
of the philosophy of science was included to point up the importance of 
the field’s developments for the model building and theorizing upon which 
the future progress of much of educational research so heavily depends. 
Another example is Chapter II on cross-cultural methods. It spotlights a 
method which some workers believe holds the key to an interdisciplinary 
approach to some of education’s important problems, and it emphasizes 
the “science” aspect of the “social sciences.” Still another example is 
Chapter IX. The electronic calculator has become more than a slave to do 
routine problems. It opens up new avenues of research and makes practical 
the use of research methods which were formerly given but fleeting 
attention because they were too complex and cumbersome. 

The original design of the issue called for a chapter to continue the 
excellent discussion of decision theory in the December 1954 issue of 
the Review. While several authors expressed considerable faith that in the 
future exciting and valuable developments would stem from this area of 
investigation, they agreed that it was too early to re-examine this field. 

The problem of facilitating the use of research results has become 
almost as crucial as that of developing new research findings. Chapter X 
discusses one method of attacking this problem, action research, and 
points to an important weakness in conventional research methodology 
to which some attention should be given. 

The organization of this issue roughly corresponds to the time sequence 
in which questions might present themselves in the research process: (a) a 
philosophical consideration of the research process and the function of 
models (Chapter I); (b) the consideration of the various research 
methods, the choice of technics, and the design of the experiment (Chapter 
II thru the Status Studies of Chapter IV) ; (c) the consideration of various 
research tools to develop problem treatment (the “sampling” part of 
Chapter IV thru Chapter VIII); (d) data processing (Chapter IX); and 
(e) use of results (Chapter X). 

The increased scope of this.issue has forced even greater emphasis on 
the REview’s traditional policy of being selective rather than comprehen- 
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sive in its bibliographic coverage. While this is specifically noted in the 
introductions of certain chapters, it is true of nearly all of them. In this 
field of the Review’s cycle increasing pressure for selection comes both 
from the intensive development of existing technics and from the continued 
addition of new ones. Consideration should be given in forthcoming num- 
bers to attempting more regular and intensive coverage of a limited field 
on alternate cycles (e.g., the December 1954 Review) and/or depending 
more heavily on other issues in the cycle to help complete coverage. 

To some extent this latter policy is already in effect. While the unity 
of educational knowledge is such that probably no Review issue finds the 
borders covered solely within its pages, this area is especially fortunate 
in that help is obtained from several parts of the cycle. For instance, test 
methodology and measurement theory are covered in more detail in the 
Review of February 1956. Similarly the issue, “Human Relations and 
Programs of Action,” of October 1953 supplements several chapters in 
this number. The June 1956 Review, “Twenty-Five Years of Educational 
Research,” contains an excellent chapter, “Methods of Research,” with a 
splendid book bibliography. The February 1957 issue of the Review 
summarizes educational research in countries outside the United States 
and notes advances in research methodology where they exist in that 
literature. Finally, our parallel issue in the previous cycle, “Statistical 
Methodology in Educational Research,” published in December 1954, is an 
excellent intensive review of one facet of the research methodologies field, 
and is far from outdated. 

We hope that readers will find our coverage of familiar topics more 
than adequate and that the review of additional areas will contribute 
significantly to the improvement of educational research. 


Davip R. KRaATHWOHL, Chairman 
Committee on Methodology of 
Educational Research 








CHAPTER I 


The Philosophy of Science and Educational Research 


MAY BRODBECK* 


Scientists do science. They formulate concepts with which to describe 
the facts they find. They look for laws connecting some of these facts with 
others. They try to formulate theories in order to explain known facts 
and laws and to help find new ones. 

Philosophers of science talk about science. They try to clarify the nature 
of scientific concepts, laws, and theories. How do concepts acquire mean- 
ing? What is the logical form of a law and of a theory? What is the 
meaning of explanation in science? Of causality? What is a model? Is 
there any connection between scientific description and the values scientists 
hold? These are some of the more general questions philosophers of 
science try to answer. They may also analyze the concepts, laws, and 
theories of specific sciences like physics, psychology, or biology. Thus, the 
logical structure of relativity theory in physics or the difference between 
vitalism and mechanism in biology might be such special issues engaging 
the attention of philosophers of science. The enterprise is clarificatory. It 
seeks neither new factual knowledge nor new technics but, by logical 
analysis, clarification of the knowledge and methods we have. 

Occasionally, this analysis implies a criticism of some of the things 
scientists say about what they do. Sometimes this criticism is helpful to 
the working scientist and contributes to greater achievement. In any 
case, clarity and understanding never do any harm. 

The less well-developed a science is, the more germane will be the gen- 
eral analyses of philosophy of science. Physicists may perhaps be spared 
a lesson in how to formulate precise, meaningful concepts. Social scientists 
can still profit from such lessons. Educational research is part of the less- 
developed social and psychological sciences. I shall, therefore, concentrate 
on recent analyses by philosophers of science of some more general prob- 
lems of meaning and explanation in science. 


Operationism 


What is the principle of proper concept formation in science? This 
question is fundamental, for the kind of answer given determines also the 
answers to many other questions such as those about the nature of 
causality and of induction. In common sense, the meaning of a term 
referring to a physical object, like a dog or a chair, is given by listing 
the observable attributes of these objects, like barking and shape. But 
science is not concerned with the meaning of physical-object terms. It 


* This paper was written during the tenure of a Faculty Research Fellowship from the Social Science 
Research Council. The author is indebted to the Council for the time thus granted. 
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takes these for granted as given by common sense. Its concern is with 
terms referring to rather more abstract properties of physical objects. The 
characteristic abstractness of scientific concepts, like mass or IQ, lies 
in the fact that these terms cannot be defined by simply listing a cluster 
of directly observable attributes. Merely by looking at a surface we can 
tell whether it is red or by looking at an object whether it is a dog. We 
cannot so simply tell what the mass of an object or the IQ of a child is. 
Yet, we must know what to look for in order to tell whether statements 
that an object has a given mass or a child a certain IQ are true or false. 
How then are these scientific words to be defined? 

Operationism answered this question by making explicit what had been 
customary practice within physical science for generations. It pointed out 
that scientific terms must be defined, not in isolation as in a dictionary, 
but by stating the observable conditions under which a sentence contain- 
ing the term is true or false. More specifically, an operational definition 
has the form of a conditional or if-then sentence. The antecedent or if 
clause of this sentence states the test or stimulu¢ conditions or what must 
be done in order to make certain observations. The consequent clause 
states the truth or response conditions or what must be observed after 
the test conditions have been imposed. In the case of quantitative concepts, 
these test conditions consist of certain measuring procedures or operations 
such as weighing on a balance or giving an examination. The truth con- 
ditions state what must be observed after these operations or manipula- 
tions have been carried out. Terms referring to personality traits, attitudes, 
and abilities must clearly also be defined in terms of behavior that is 
exhibited under certain conditions. All terms, whether quantitative or not, 
requiring the if-then form of definition may be called dispositional con- 
cepts (3). An operational definition, like any other definition, is a state- 
ment about the use of words, stating how one term may be eliminated by 
means of others. Like any other definition, it is thus purely verbal or 
tautological. 


Criticisms 


Once a heroic rallying cry in the behavior sciences, operationism is 
now everyday good scientific practice there as elsewhere. It is not a 
special philosophical or methodological position. It merely clarifies the 
form definitions of scientific concepts must take in order to determine 
when statements containing these terms are true or false. As the study 
of man came of age scientifically, it inevitably adopted the practice of 
defining its terms. In fact, the practice is a necessary condition for such 
coming of age. Yet this modest proposal for forming empirically meaning- 
ful concepts, dignified by the title, operationism, has had its critics. 

There has always been criticism from the right, that is, from sources 
essentially hostile to a science of behavior. Life and space are too short for 
more than one brief comment on these last-ditch defenders of lost causes. 
Their essentially antiscientific plaint is to the effect that definition, opera- 
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tional or otherwise, deprives science of the rich halo of meanings sur- 
rounding terms in ordinary use. So it does, and so it should. The scientist 
may draw upon this halo for hunches about laws, but if he wants objective 
knowledge of behavior, he cannot carry over the vagueness of ordinary 
usage into his technical vocabulary. Or so it would seem. Yet rather early 
in the game, operational definitions were attacked as unduly restrictive 
by sources undeniably favorable to a science of behavior (11, 14, 15). 
Since operational definitions are, after all, just definitions, it is not sur- 
prising that, in fact, the strictures were generalized to embrace the whole 
principle of empiricism that concepts to be meaningful must be defined in 
terms of observable properties of things. Instead of the narrow criterion 
of definability, it was urged that the meaning of a term must be left open. 
This openness is gradually filled in, tho never closed, as we increase our 
knowledge. Or, as it is also put, the meaning of a term is given not 
explicitly by definition, but implicitly by the set of laws or theory in which 
the term occurs. 


Explicit Definition and the Nomological Network 

What are the arguments for and the merits of the view that meaning 
cannot be given by explicit definition? 

There are several different ways of measuring the length of an object, 
hunger in a laboratory animal, or the IQ of a child. If the meaning of a 


term is given explicitly by its definition, all these different antecedent 
measuring conditions result in different definitions for length, hunger, or 
IQ. Yet we may have good reason for believing, or even just feel, that 
these all measure the same thing. Moreover, we may continue to devise new 
test conditions for the presence or absence of the property in question. 
Any definition by means of a single test condition for the presence and 
absence of an attribute, therefore, fails to capture the full meaning of 
the term. And since we are always adding to the list, the meaning of a 
concept is never more than partially determined. The group of alterna- 
tive criteria for the application of the term does not, therefore, literally 
define it. The unending list of test and truth conditions does not permit 
the elimination of the term. Accordingly, it is said to be reduced to, 
rather than defined by, the set of if-then sentences about the conditions 
when the term is applicable. The latter, in turn, are called reduction sen- 
tences and only partially specify the meaning of the term (11, 14). 
Definability was thus liberalized into reducibility. The latter counte- 
nanced, if it did not advocate, terminological indefiniteness. It is perhaps 
not surprising that empirical reference was soon to be drowned in a sea 
of context. Consider certain clinical and social concepts, like anxiety or 
group morale. What kinds of observable behavior shall constitute the 
definition of such terms? Answers to a test may give a high anxiety score. 
But people with identical scores also exhibit many different varieties of 
behavior. To really know what anxiety means, it seems reasonable to say 
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that one must know not merely the test scores but all the characteristics 
predictable on the basis of them. Such concepts, according to the most 
lucid exposition of this view (12), are postulated attributes or constructs 
which are reflected, but not defined, by the test performance and all other 
behaviors positively correlated with it. Their full meaning is given by the 
set of laws in which they occur. 

This set of laws is called a nomological network. The more incomplete 
the network, that is, the less we know, the vaguer are our concepts. Until 
we know all the laws in which a term occurs, we do not know precisely 
what it means. It is only partially defined. After we have all the laws, 
the term is implicitly defined by this network. Reverberating in this doc- 
trine is an echo of the arch-rationalist Hegel. A denial that terms have 
referential, noncontextual meaning is the essence of Hegel’s coherence 
theory of knowledge. Does the nomological network disguise in empiricist 
trappings a resurgence of the old idealistic metaphysics? In any case, the 
extreme view (18) that no term has meaning apart from a system of laws 
is not generally embraced by the critics of definition. The anti-empiricist 
implications of a view which bases meaning wholly on context at the 
expense of extrasystematic descriptive reference are too potent. The view 
is unpalatable both to the scientist who tries to describe the world and 
to the philosopher who tries to show how the scientist accomplishes this. 
Its more moderate advocates grant that some of the terms of the network 
are definable in terms of observable events (10, 12). Those which al- 
legedly are not thus definable are held to be connected to these by chains 
of laws. They are thus partially coordinated to the realm of experience. 

Logical and philosophical technicalities apart, this is the gist of the 
arguments against explicit definitions. To some, they are convincing. 
Others find them irremediably confused (3, 4). In particular, the formula 
that “the meaning of a concept is the set of laws in which it occurs” is 
charged with confusing issues of fact with those of meaning. This confusion 
is abetted by an ambiguity in the term meaning. 


Meaning and Significance 


In one sense of that word, meaning is something we give to a concept 
when introducing it for the first time. It is what we agree to call a certain 
attribute or cluster of attributes. Thus it is purely verbal or, as one says, 
a matter of convention that dog refers to a barking rather than to a meow- 
ing creature. On the other hand, the generalization or law that “dogs are 
carnivorous” is a matter of fact and not a matter of the way we use words. 
Yet, on the nomological-network view, meat eating is part of the meaning 
of the concept dog. As we know more about dogs, we expand the meaning 
of the concept. 

To be sure, in one sense of meaning, a concept is more meaningful to 
us the more we know about it. But what is the it about which we know 
more and more? This can be answered only by distinguishing between two 
different meanings of meaning (2, 3). One of these is the empirical referent 
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for which the term is introduced. This is purely verbal. The other refers 
to the set of laws in which a term occurs. This is an empirical matter. The 
word significance has been suggested for the latter. A term may have 
meaning without significance, but not conversely. Can we not add the new 
things we find out about something to its definition, thus giving more 
meaning to the term? Up to a point, we can. Suppose that the results of 
three different tests for intelligence are found to be concomitantly con- 
nected. [ntelligence is now redefined, so that it means not any one of 
these test results, but all three together. The term has been redefined to 
keep up with our knowledge, but what has been gained by doing so? 

Suppose intelligence now means a certain level of vocabulary, arithmetic 
ability, and general information, or, perhaps, something presumed com- 
mon to all three, like linguistic ability. Yet, in order to be able to predict 
that an individual with high vocabulary is also good at arithmetic, we 
would still have to state separately the empirical law connecting these 
attributes. So nothing has been gained by packing everything we know 
into the meaning of the term. Suppose it happens that status as initially 
defined in one way and intelligence as defined in another are found to be 
connected. Do we want to coin a new word whose meaning will include 
both the referent of status and that of intelligence? 

This is the logical conclusion to which the notion of partial definition 
carries us. Yet it leads nowhere. Without independently asserting the 
empirical law connecting status with intelligence, the new concept is just 
a word for a cluster of characteristics. Without the statement of the law, 
we have no justification for stating that members of this cluster uniformly 
occur together. Nothing follows from definition or meaning alone. Only 
from laws can we make predictions. 

Furthermore, how could we have discovered that intelligence is con- 
nected with status unless these terms have independent meaning? Again, 
what is it that we have knowledge about? In other words, in order to 
discover the significance of a term, we must first know its meaning. Signifi- 
cance alone will never give us empirical science. If the system of laws is 
to be about the word—if it is to be a factual, descriptive system—its con- 
cepts must also have meaning in the sense of designation or reference. 

Nor can laws intelligibly be said to implicitly define their terms. The 
term implicit definition is a most misleading manner of speaking (8). If 
it makes sense at all, it makes sense only when one is speaking of the 
axioms of a formal system, that is, a system of marks on paper containing 
expressions like “X and Y,” to whose letter variables no meanings have 
been attached. For such a system, the axioms implicitly define its terms 
only in the sense that, showing structure tho no content, they delimit 
the range of possible meanings or the interpretations that can be given 
to the symbols of the system if true statements are to result. Replacing the 
letter variables by some empirical concepts gives true sentences. When 
replaced by others, the resulting sentences will be false. But there may 
be many alternative sets of empirical terms which will give either result. 
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The term makes no sense when, as in empirical science, our system con- 
tains not uninterpreted letter variables but empirical concepts, theoretical 
or otherwise. 

What of the technic of partial interpretation, or of partially coordinating 
so-called theoretical constructs to terms that do name observables? This 
procedure was suggested by the example of atomic and subatomic physics, 
the only place in physical science where terms are introduced in this way. 
But it is the axiomatic system in which these atomic notions are em- 
bedded which is partially coordinated, not the atomic concepts them- 
selves. Some of the terms of this axiomatic system are coordinated to 
those of another system, like thermodynamics, all of whose terms are 
empirically defined. On the other hand, some of the terms, like electron 
and mass of electron, are not coordinated at all. They do not have 
partial empirical meaning. In this very special sense, they have no empiri- 
cal meaning at all. The situation is a very special one indeed (3). 

The uncompleted network of behavioral laws in which the theoretical 
terms occur will presumably come to include terms referring either to 
further behaviors, or to neurophysiological, chemical, or other such events. 
In any case, these other terms would, if they were not leaning on each other 
in the curious circular manner of the nomological network, refer to observ- 
able attributes. No matter which way you look at it, the behavioral scien- 
tist’s candidates for theoretical constructs, like hostility, aggression, morale, 
and the like, must have some referential meaning. If, on the one hand, 
meaning is confounded with significance and the meanings of these terms 
are given by the laws in which they occur, then for there to be any laws, 
these terms must have independent meaning. On the other hand, if these 
terms have no independent meaning, their connection to observables can be 
only verbal or definitional. Their empirical meaning is then given by this 
definition, which may, of course, be changed. 

Definitions, being merely verbal, are tautologies. Laws are empirical 
statements. If the latter are absorbed into the former, if everything is made 
a matter of meaning, we have not an empirical science but a structure of 
tautologies. This would make nonsense of the whole enterprise. How then 
does one come to hold this view? Practically speaking, behavioral scientists 
often cannot define their terms precisely. The precise definition of social 
attitudes or clinical states requires one to choose from an almost infinite 
variety of symptoms those which can be used reliably to define the term 
in question (8, 9). The list of behaviors which together enable prediction 
to other behaviors cannot yet be sharply terminated. Such terms thus 
have a fringe of vagueness or openness. To achieve greater reliability and 
significance, such terms are frequently redefined to include new factors 
or, for that matter, to drop out old ones. Loosely speaking, we say the 
original definition was only partial. Accurately speaking, we frequently 
abandon our definitions and propose new ones. The initial vagueness and 
consequent frequent redefinition is part of the hit-and-miss way a science 
progresses. But the logic of science is concerned with the principle for 
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good concept formation. The psychology of discovery is something else 
again. The purposes of neither science nor logical analysis are served by 
exalting the exigencies of research in a difficult area into a methodological 
principle. 

Causation 

What is the difference between a true and a spurious correlation? What 
is the difference between a causal connection and a merely accidental 
conjunction of events? Despite immediate appearance to the contrary, 
these are not quite the same questions. Researchers accustomed to working 
with statistical correlations have developed technics for distinguishing the 
true from the spurious correlation. A high correlation, for instance, 
between female marital status and job absenteeism is said to be true, 
while that between marital status and candy consumption is called spurious. 
In the latter case, the introduction of an additional factor, age, leads us 
to abandon the original correlation. In the true case, on the other hand, 
the introduction of an additional factor, increased housework, is said to 
confirm the correlation. Why, in each case, do we treat the original corre- 
lations differently after introducing the additional factor? 

After all, marriage is statistically correlated with age, thus also with 
candy eating. Statistically, therefore, in both cases there actually is a cor- 
relation and both are explained by the third factor. Married people eat 
less candy because they are older; married women are absent more from 
jobs because they have more housework. We justify saying that never- 
theless the former correlation is spurious and the latter is true, because 
we analyze the notion of a true correlation in terms of a presumed causal 
connection. Getting married causes more housework, which in turn causes 
increased absenteeism, so getting married is truly correlated with absentee- 
ism. Getting older, on the other hand, is a common cause both of marriage 
and eating less candy. All concomitants of age, like grey hair and 
paunchiness, would give a high correlation with eating less candy, if age 
does. They have a common cause, but are not causes of each other. Thus, 
the difference between true and spurious correlations resolves into a 
difference between causal and noncausal connections. 

Nor does ihe difference between causal and noncausal conjunctions arise 
only for statistical correlations. Nonstatistical generalizations, asserting for 
all things of a certain kind that they are uniformly connected with some- 
thing else, also raise the same problem. “All gases expand when heated” 
states a true causal connection, while “All the books on my desk are blue” 
does not. Philosophers have puzzled about how to distinguish the truly 
causal connections from those which are merely accidental. In particular, 
it has been pointed out that the usual formulation of an empirical law 
as an if-then statement does not reveal this distinction. Both the real law 
and the accidental conjunction would each be expressed as conditional 
statements. If anything is a book on my desk, it is blue; if anything is a 
heated gas, it expands. The conditional states the observed constant con- 
junction to these characteristics. 
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The analysis of statements like “A causes B” into statements about a 
uniform conjunction of events, without using the term cause, has been, ever 
since David Hume, basic empiricist doctrine. This analysis follows from 
the empiricist criterion of meaning, of which operationism is merely an 
application. All that we observe is the constant conjunction of the events 
“A” and “B” and not a third thing called a cause. 

Idiomatically, we may distinguish between accidental and causal con- 
nections by using the subjunctive mood. If this gas were heated, it would 
expand. On the other hand, if a book in the bookcase were on my desk, it 
need not be blue. It has, therefore, been suggested that only by means of 
the subjunctive can we distinguish lawful connections from mere generali- 
zations (13, 17). “A causes B” or “If A then B” is an empirical law only 
if we can truly assert the corresponding subjunctive, “If anything were 
A, then it would be B.” To put it differently, if the corresponding subjunc- 
tive is true, then we have a real connection, otherwise only an accidental 
generalization. This seems a rather neat solution. Unfortunately, it has 
some obvious difficulties. 

How are we to know the truth or falsity of the corresponding subjunc- 
tive? Fundamentally, there are only two alternatives. One is that we know 
it by inductive generalization from observation. But we observe only that 
whenever we have A, we also have B. The subjunctive, therefore, can be 
asserted only on the basis of prior knowledge of the indicative condi- 
tional. But then the subjunctive is superfluous since the evidence for it is 
no different from the evidence for the corresponding indicative condi- 
tional. The alternative is that we know the truth of the subjunctive in some 
special way. The empirical evidence for both the causal connection or true 
empirical law and the accidental conjunction is never more than a finite 
number of instances. There are thus no observations distinguishing the 
truth of the subjunctive from that of the indicative form. If, therefore, the 
subjunctive says more than the corresponding indicative and if this excess 
meaning is not further analyzable, we must know it in some special way. 

We must somehow grasp or see that one subjunctive is true while another 
is false. On the empiricist analysis, a law of nature is expressed by the 
indicative if-then form. Rejection of this analysis leads us down the 
path of rationalistic intuition or reason. I mentioned before that one’s 
principle of proper concept formation or criterion of meaning was funda- 
mental. We see now why this is so. On the unanalyzable subjunctive view 
of empirical laws, we are, in effect, back to an unanalyzed notion of cause 
and to rejection of the empiricist criterion of referential meaning. In- 
ductive generalization gives way to intuitive grasp of real connections. Is 
this really the price we must pay for the ability to distinguish between 
lawful and accidental uniformities? Clearly, this is a distinction we should 
like to be able to make. Fortunately, this can be done without sacrificing 
empiricist views on meaning and knowledge. But the distinction cannot 


be made simply by considering generalizations by themselves, in isola- 
tion (3, 6, 7, 20). 
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The difference between a law and an accidental conjunction of events 
is a matter of fact and not of meaning. For matters of fact, it is reasonable 
to point out that we must look at the context, that is, at the rest of what we 
know. Let us reanalyze the difference between the spurious correlation 
between female marital status and candy consumption and the true corre- 
lation between female marital status and absenteeism. Why in this latter 
case does an additional factor, increased housework, confirm the correla- 
tion? The answer can be given without the use of cause. Introducing the 
third factor, more housework, leads to two new generalizations: When a 
woman marries, she has more housework, and if housework increases, so 
does absenteeism. From these two generalizations, the correlation in ques- 
tion between marriage and absenteeism follows as a deductive consequence. 
It is thus explained by them, in the only precise meaning explanation has 
in science. Because we can explain the correlation by deducing it from 
other generalizations, we consider it to be a true one. On the other hand, 
in the spurious case, the additional factor, age, does not permit such de- 
duction or explanation. Again we have two new generalizations, namely, 
age correlates with marriage and age correlates with candy consumption. 
But from these two generalizations, all that logically follows is that age is 
correlated both with marriage and with candy consumption. We cannot 
derive the correlation between marriage and candy consumption. When 
we define explain precisely, we see that the new factor, age, does not ex- 
plain the correlation. That is why it is abandoned as spurious. 


Theories 


A theory is a deductively connected set of generalizations. A generali- 
zation is a law if it is part of a theoretical structure. The generalizations 
serving as premises are laws because they permit the derivation, hence 
the prediction and explanation of other laws. If a generalization either 
predicts or is predicted by other laws, the evidence for it is more than 
the mere conjunction of its observed instances. It is for this reason that 
we state firmly that if a gas were heated, it would expand. We assert the 
subjunctive because the law about the expansion of gases is not due to 
mere enumeration of instances. On the other hand, neither is it due to any 
unanalyzable connection between temperature and expansion. Rather, we 
believe this to be more than a mere conjunction because it is part of the 
theory of thermodynamics. It both implies and is implied by many other 
highly confirmed statements. Until we know more about how an isolated 
correlation is connected with other facts and generalizations, we cannot 
tell whether it is true or spurious, to use the statistical jargon. The de- 
cision about whether we have a law or an accidental conjunction thus 
depends upon further empirical knowledge and is not a matter of intuitive 
insight of grasp of real connections among things. 
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Models 


The recent literature of behavioral research is replete with models. The 
time is clearly more than ripe for a thoro logical analysis of their nature 
and function. Yet, such attention as philosophers of science have recently 
given to models is disappointing. Except for one highly technical treat- 
ment (6), philosophical discussions, like those of scientists themselves, 
are rather more hortatory than clarificatory. Models, we are assured, are 
the “core of discoveries” (19). How, and in what sense, are they the 
core? Beyond reiterating that they provide a way of conceiving or thinking 
of phenomena, a way of speaking (16, 19), no real clues are offered to 
the logical connections between the model and the theory for which it is 
a model. Nor are we told precisely how it is that models help us explain, 
beyond apparently providing a feeling of familiarity. Optics, we are re- 
minded, uses a geometrical model. It deals with optical phenomena by 
the use of geometrical pictures. But what exactly is the connection between 
geometry and the physics of light rays? Are the pictures and diagrams 
really an essential part of the model? How, in general, can the laws of 
one area, like geometry or physiology, be a model for the laws of another 
area, like optics or psychology respectively? Is the term model always 
used in the same way? How do models differ from theories? Questions 
like these and many more must be answered, if we are to understand 
the nature and function of models. 

The fact is that the term model is used most ambiguously. Nor is 
mathematical model any more precise since this term, too, covers different 
things. Broadly speaking, there are two major uses of model. The most 
general use is as a synonym for theory. A scientific theory is a deductively 
connected set of laws or generalizations, some of which, the axioms, logi- 
cally imply others, the theorems. A theory may be well or ill confirmed, 
narrow or broad in scope, quantified or nonquantified. Model is now fre- 
quently used for those theories which are either highly speculative or 
quantified, or, most likely, both. Thus, a guess about the connections 
between quantified variables of an area, like psychology or economics, 
will frequently be called a mathematical model. Such hypotheses are 
mathematical only in the sense in which physics is mathematical. That is, 
they are empirical generalizations whose variables are quantified, so that 
we can say how much one variable changes with changes in others. They 
share the virtue of all quantified theories in permitting more precise de- 
duction and prediction. 

Quantification, however, is no guarantee of scope. In areas where be- 
havior depends upon many different variables, we may indeed pay for 
quantification with triviality. But then, nonquantified guesses at theories, 
like the doctrines of psychoanalysis or speculations about the physiological 
concomitants of behavior (which are often broader than quantified theo- 
ries but lack their precision) are also frequently called models. Such specu- 
lative theories, whether quantified or not, are after all just theories. The 


436 





December 1957 PuriLosopHy OF RESEARCH 





term model serves no particular purpose beyond, perhaps, emphasizing 
the tentative, unconfirmed nature of the hypotheses in question. This 
usage would be harmless enough if it were not the case, as it unfortu- 
nately is, that there is another, quite different prevalent use of the term. 


Isomorphism 


Strictly speaking, I should have said two further uses (9). For in this 
second meaning of the term two different things are really involved tho 
they have a common feature. This feature I shall now explain. A miniature 
train is a model of a real train if it is isomorphic with it. Isomorphism re- 
quires two conditions. First, there must be a one-to-one correspondence 
between the elements of the model and the elements of the thing of which 
it is the model. For every chimney stack, there is a miniature chimney 
stack; every window has its replica, and conversely. Second, certain rela- 
tions are preserved. For instance, if a door is to the left of a window in 
the original, their replicas are similarly situated; also, the model is con- 
structed to scale. If the model works on the same principle as the original— 
if, for instance, a model steam engine is also steam propelled—the iso- 
morphism is complete. Extending this notion to theories, a precise meaning 
of model may be formulated. The form of a law is given either by the 
verbal if-then formulation or by an equation. If, for example, weight is 
a linear function of height and if supply is a linear function of demand, 
then these laws have the same form tho different content. The content 
is given by the empirical terms. Two theories whose laws have the same 
form are isomorphic or structurally similar to each other. 

If the laws of one theory have the same form as the laws of another 
theory, one may be said to be a model for the other. This is the second 
most general meaning of the term. The laws of one area may suggest 
hypotheses about the form of laws in another area. The notion of model 
as isomorphism of laws is obviously symmetrical. However, when an 
area about which we already know a good deal is used to suggest laws 
for an area about which little is known, the familiar area providing the 
form of the laws may be called a model for the new area. Thus, the biologi- 
cal theory of evolution may be used as a model for social theory. 
Servomechanisms, like the automatic pilot or thermostat, are now fre- 
quently evoked models for learning and purposive behavior. 


Testing Models 


How does one test these suggested models? First, it must be possible 
to state clearly what is in one-to-one correspondence with what. Organisms 
grow; that is, they increase in size and weight. What is social growth? 
Relatively precise meaning can be given to adaptive and nonadaptive 
characteristics of organisms within evolutionary theory. Can we give cor- 
respondingly precise meanings to these notions for human institutions? 
Once clearly defined empirical concepts are made to correspond to the 
terms of the model, structural similarities, if any, are sought. Nutrition 
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is connected with growth in biology. Are the social concepts correspond- 
ing to nutrition and to growth similarly connected? In other words, not 
only must the terms of the two areas correspond, but the connections be- 
tween them must also be preserved if the model is to be of any use. An 
area, either part or all of it, can be a fruitful model for another only if 
corresponding concepts can be found and if at least some of the laws con- 
necting the concepts of the model also can be shown to connect their 
corresponding concepts. 


Arithmetical Models 


Replacing all the empirical, descriptive concepts in the theory of one 
area by those of a different area results in another theory with the same 
form but content different from the original. The isomorphic sets of laws, 
those of the model and of its translation, are both empirical theories whose 
truth or falsity depends upon the facts. It is possible, and often highly 
desirable, to establish another kind of isomorphism, in which the result 
is not two empirical theories sharing a common structure. Instead, the 
laws, or some of them, of an empirical theory may have the same form 
as a set of purely arithmetical truths. If this is the case, the latter is called 
an arithmetical representation of the empirical theory. Mathematical model 
sqmetimes means just this sort of arithmetical representation of an em- 
pirical theory. The laws of arithmetic, rather than those of another em- 
pirical theory, may be used as a model when, for instance, it is desired 
to rank or measure the variables of an area. If, like the integers, the em- 
pirical terms of an area obey the axioms of order, they may be ranked. 
For instance, one of these axioms states the transitivity of the arithmetical 
relation, greater than, among integers. This means that if one number is 
greater than a second and the second is greater than a third, the first num- 
ber is also greater than the third. Replace the integers by names of in- 
dividual people; replace the arithmetical relation, greater than, by the 
empirical relation, smarter than. If the statement resulting from this trans- 
lation is true, the empirical relation of one person being smarter than an- 
other is transitive. If, in addition to this axiom, smarter than also satisfies 
the other axioms of order, individuals may be ranked by this relation. 

If the variables of an area are quantified and obey not only the axioms 
of order but also further axioms for the addition of integers, measure- 
ment is also possible (1, 5). Measurable descriptive properties are those 
having the same structure as the addition of numbers. The measurability 
of descriptive properties is expressed by a set of empirical laws which are 
isomorphic to the laws of arithmetic. By virtue of this isomorphism, num- 
bers may be assigned to the properties of things, resulting in quantified 
empirical laws. Other parts of arithmetic serving as models for empirical 
properties are, for example, probability theory and the theory of games 
(9). A correspondence is established between the empirical concepts and 
those of the arithmetical theory. 
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Whether or not true empirical laws result from this correspondence 
depends upon the facts. Many properties are not transitive and thus can- 
not be ranked. Many properties are nonadditive and thus cannot be meas- 
ured. If, however, the empirical variables do share the same structure as 
the laws of arithmetic, all the arithmetic theorems can be used to make 
deductions from these quantified laws to other laws and facts. When a 
model, either empirical or arithmetical, is used as a source of hypotheses 
about the connections among the variables of another area, it does not 
explain these hypotheses. It merely suggests their form. If, however, these 
new hypotheses are confirmed, they may be used to explain and predict 
new knowledge. 
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CHAPTER Il 
Research Methods: The Cross-Cultural Method 


GEORGE W. GOETHALS and JOHN W. M. WHITING 


A coop method, to paraphrase Guthrie, is at best a tool by which 
explanation is furthered. In the final analysis a theory or method 
stands or falls upon its utility. The cross-cultural method presents a 
number of paradoxes which the reader must keep in mind as he con- 
siders it as a possible tool for research. First, altho the method has been 
in existence for some 70 years, not until recently has either its scope or its 
usefulness been recognized. Second, unlike many areas of research, there 
are relatively few studies which have been published utilizing this particu- 
lar method. An evaluation of a methodology usually is based upon a large 
amount of published research which the reviewer can refer to and evaluate. 
Such is not the case with this particular method. However, despite the 
paucity of published research this particular method has had a great effect 
upon behavioral science, and in relation to recent work may have shown 
itself to be the most sensitive methodology available for those who wish to 
accomplish interdisciplinary research in the behavioral sciences. Since 
this is the first article dealing with the method in the Review, some re- 
capitulation of its history is appropriate. This is particularly true since 


the cross-cultural method is often confused with another valuable approach 


to the study of behavior, namely, the comparative or culture-personality 
study. 


Historical Background and General Discussion 


To gain perspective, it is interesting from a historical point of view to 
note that this method antedates the birth in 1912 of modern psychology 
as delimited by Woodworth (44) and Boring (7) and the advent of modern 
clinical psychology usually identified as beginning with the work of Binet 
in 1895 (33). The cross-cultural method can best be understood if two facts 
are kept in mind. First, since the first study by Tylor (37) in 1889, this 
has been a method employing statistical technics to test theory. Second, 
while the cross-cultural method in recent times has been concerned with 
matters of personality development in different cultures, it differs markedly 
from the comparative study because of its insistence upon testing theo- 
retical positions. 

The work of Honigmann (18), Kardiner (20), Kardiner and others 
(21), Mead (27), Mead and Metraux (28), and Hallowell (14), while 
generally highly sophisticated in its approach to the investigation of the 
theoretical problems, has in common one dimension which clearly separates 
it from cross-cultural research. This is that these comparative studies con- 
sistently concern themselves with determining whether a particular hypoth- 


441 





REVIEW OF EDUCATIONAL RESEARCH Vol. XXVIII, No. 5 





esis, derived from some segment of theoretical insight, can be fitted into 
some more general cultural framework. The implication of such a technic 
is that the investigator accepts without question the particular theory 
under discussion and is exemplifying it upon a broader canvas. The few 
quantitative facets of such studies which do exist have to do almost entirely 
with determinations of normative or modal behavior. 

The cross-cultural method, on the other hand, is always concerned with 
the test of some theory. Generally it uses quantitative technics described 
by Stephenson (36) as “R methodology,” and further, these are in keeping 
with designs derived from the thinking of Fisher (10). Very simply, these 
technics are concerned with testing the significance of some form of cor- 
relation. If this difference is kept in mind at the start, much of the con- 
fusion between the two technics will be avoided. Those who are interested 
in detailed analyses of the difference between these two useful approaches 
are referred to work by Allinsmith and Goethals (1), Lewis (24), Me- 
Clelland (25), Murdock (30), Whiting (39), and Whiting and Child (40). 


Cross-Cultural Studies Testing Aspects 
of Evolutionary Theory 


The first cross-cultural study was read at a meeting of the Anthropologi- 
cal Institute of Great Britain, presided over by Sir Francis Galton. This 
was a research by Tylor (37) investigating the development of laws of 
marriage and descent. The study was concerned with testing constructs 
derived from the theory of evolution. In 1915 Hobhouse, Wheeler, and 
Ginsberg (16) completed a study testing hypotheses which predicted re- 
lationships between certain social institutions and stages of economic 
development. After another lapse of 25 years Simmons (34) and Murdock 
(29) investigated other constructs relating to kinship derived from evo- 
lutionary theory. These studies are important in that their sophistication 
and use of newer statistical technics foreshadowed later studies. 


Cross-Cultural Studies Testing Behavioral Theory 


During the past 20 years there has been growing interest in the inter- 
disciplinary responsibility for developing a general science of behavior, as 
evidenced by attempts to bring together the most profitable and rigorous 
ideas from the fields of anthropology, psychoanalysis, and experimental 
psychology. One of the best reviews of such efforts was presented by All- 
port (2). One of the first to see the possibilities of this rich body of theory 
was Ford (11, 12). His study of human reproduction, because of his 
normative treatment, is not in the strictest sense of the word a cross- 
cultural research; however, his methods of arriving at ratings of various 
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behaviors in different cultures have been used in one way or another by all 
succeeding cross-cultural researchers (13). 


Within the limits of our definition, that is that a cross-cultural study 
must test some hypothesis, the first study testing behavioral theory was that 
of Horton (19). He employed the cross-cultural method to investigate the 
relationship between the drinking of alcoholic beverages and anxiety. His 
findings revealed that alcohol reduced inhibition in all societies unless 
specific measures were taken to prevent it. Other researches followed; the 
most complex was Murdock’s fascinating work (30), published in 1949, 
which tested many hypotheses concerning kinship terminology and its 
relation to such phenomena as forms of marriage, descent, and social 
structure generally. 


Whiting’s early study (38) reporting the relationship between sorcery 
and social control is of great historical importance. The steps that Whiting 
took in constructing her research foreshadowed the more sophisticated 
methodology which was to become part of later studies, and at the same 
time attempted to answer some of the original criticisms of the method. 
First Whiting, on the basis of her field work with the Paiute Indians and 
from behavioral theory in general, derived a number of hypotheses having 
to do with the function of sorcery as a means of social control. Once these 
hypotheses were stated, the next step was to examine the same variables in 
a number of other cultures and to test to see if they had the same re- 
lationship they had had in the original single culture. One of the important 
aspects of the study was that she tried to rate independently the variables 
under consideration and thus to make sure that any relationship between 
them was a result of a theoretical interaction rather than rater bias (42). 
She also undertook to control the geographical distribution of her sample 
of culture so that her findings could not be attributed to cultural diffusion. 
This progression from the single case to testing hypotheses derived there- 
from, with reference to a sample of cultures, and then controlling for the 
contamination of the findings as a result of geographical concentration 
reveals the eventual design of later cross-cultural studies. 


Behavioral theory has given rise to a number of studies all marked by 
their richness and rigor. Barry (5) tested the relationship between the 
education of the child and art forms. Wright (45) demonstrated the re- 
lationship between the content of myths and education as these both relate 
to aggression. McClelland and Friedman (26) demonstrated the relation- 
ship between certain child-training variables and need achievement. 
Finally, in 1953 Whiting and Child (40) showed the relationship between 
various technics of education and the development of super-ego and other 
manifestations of personality. 


As provocative as these early studies have been, they only hint at the 
limitless possibilities of the cross-cultural method. The methodology did not 


come of age until recently; this was not possible until certain flaws in the 
technics were rectified. 
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The Problem of Data 


Whiting and Child (40) in their elaborate study relating socialization 
pressure in different societies to facets of adult personality, described pre- 
cisely how data are collected and utilized in any cross-cultural study. De- 
spite this lucid description it is interesting that most of the criticisms of the 
cross-cultural method have been focused not so much upon the method 
itself, as upon the data various investigators have used. This criticism 
evolves from two preoccupations, one within and one outside the discipline 
of anthropology. Both of these dimensions are of concern and demand 
attention. Within anthropology, Spiro (35) and Kluckhohn (22) pointed 
out that the report of the field worker upon any given culture is no little 
affected by his conceptions of what anthropology as a science includes. 
Any given individual thus views a culture in terms of his background and 
training, and unless these are constant, any report of what is seen is open 
to the widest amount of variation. Criticisms by Lewis (24) and by Henry 
(15) concern a matter general to any investigation of behavior and relate 
to the control that can be introduced into any field investigation. 

There is no doubt that these observations have some validity; criticism 
of the use of ethnographic sources written at different times by people with 
a variety of backgrounds and personal predilections is a telling one. As a 
criticism, however, it would be far more serious if the practitioners of the 
cross-cultural method had shown themselves to be unaware of this difficulty 
and even more important had done nothing to remedy it. Many of the 
reservations concerning the cross-cultural method can be put into proper 
perspective when it is realized that no group has been more aware of the 
limitations of their data than those who have employed the method. 

Obviously, ethnographies already in existence cannot be completely re- 
written. However, such materials can be brought up to date in relation to a 
strict set of criteria. The publication in 1950 of the Outline of Cultural 
Materials (31) by the Human Relations Area Files provided a method of 
organizing the materials and for correcting some of the deficiencies which 
exist. The introduction to the Outline provides the person working in an 
applied field, such as education, an excellent overview not only of the 
processing of data but also of its application to everyday problems. 

However, the best possible way to answer the criticisms brought forward 
both by anthropologists and behavioral scientists in general was to under- 
take a program which answered both kinds of criticisms, that is, provide 
training so that a group of anthropologists could collect field data in the 
same way and provide training in the use of the method before going into 
the field. Investigators for three universities, Cornell, Harvard, and Yale, 
financed by grants from the Ford Foundation and the Social Science 
Research Council, have jointly since 1952 undertaken this dual task. From 
1952 until 1954, teams were trained before going into the field to undertake 
an investigation of socialization in five different cultures: in New England, 
Okinawa, Mexico, the Philippines, and in a Hindu culture. These field 
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teams returned in 1955 and since that time have been analyzing the data 
collected under these carefully controlled and comparable conditions. The 
critics of the cross-cultural method will find most of their objections 
answered by a perusal of the field manual which specifies in detail the 
variables of the research and their method of measurement (43). 


The universality of the method is naturally its recommendation, and the 
kinds of observations made are consistent with one of the most intensive 
studies ever done on socialization in American culture (32). These studies 
of the five different cultures will be published within the next few years 


and will exemplify the cross-cultural method at its newer level of sophistica- 
tion. 


Recent Changes in Scope and Method 


As these new safeguards relating to existent data and the collection of 
new information were being undertaken, cross-cultural researches began to 
go forward into new areas. Hollenberg (17) and Faigin (9) tested a 
number of hypotheses relating to technics of education and the develop- 
nent of the super-ego from a sample of three cultures in the Southwest. 
In this study, as in the comparative study of values in five cultures directed 
by Florence Kluckhohn, John Roberts, and Evon Vogt of Harvard Uni- 
versity, the methodology is modeled on that used in conjunction with 
ethnographic literature but has been broadened to include field research. 
The approach is not the same as the elaborate collaboration previously 
described, but it, too, has the advantage of increasing the comparability 
of data and at the same time testing and retesting hypotheses based upon 
individual differences in dissimilar cultural settings. 

Another advance of the cross-cultural method within the scope of ethno- 
graphic materials is exemplified by the work of Ayres (4), Anthony (3), 
and Whiting, Kluckhohn, and Anthony (41). Essentially these studies are 
concerned with the interaction of various sets of variables rather than with 
a strict antecedent-consequent relationship between two variables. Ayres, 
for example, was concerned with the relationship between pregnancy 
taboos, family structure, and dietary regulations. Anthony was concerned 
with showing how initiation ceremonies were related both to child training 
practices and to aspects of the kinship organization of various cultures. 


The way Anthony went about his study is as important for this new kind 
of research as the steps Beatrice Whiting took in her study of sorcery. 
Anthony began with a construct from psychoanalytic theory discussed at 
some length by Bettelheim (6), who observed that certain cultures put 
young males thru extremely painful initiation rites before permitting them 
the status of men. Usually these are related to some painful operation 
involving the genitals, and typically the ordeal involves public circumcision 
without anesthesia. The behavioral scientist refuses to admit that such a 
traumatic event exists for capricious reasons. Bettelheim had seen this 
“symbolic wound” as being a way both to introduce the young male to 
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adulthood and to control his access to women before maturity. Anthony 
proceeded to test this hypothesis and found that there was indeed a striking 
relationship between the rite de passage and the social organization of the 
group in which it took place. Unexpectedly, however, he found some im- 
portant relationships between kinship, education, and the forms of mar- 
riage. The tendency of a good cross-cultural research to generate new 
hypotheses is thus exemplified in Anthony’s research. 

Whiting and Kluckhohn, drawing upon the work by Anthony, showed 
the intricate relationship between ceremonies of initiation and many 
aspects of the social milieu which exists around them. Most important to the 
field of education is the light shed upon the variations in the intensity of the 
adolescent revolt, a phenomenon that is shown to be widespread. The ways 
in which other societies cope with this dilemma can help us gain new per- 
spectives on the problems of juvenile delinquency and the modes of tran- 
sition from childhood to adulthood. 


Implications of the Cross-Cultural 
Method for Educational Research 


The advantages of the cross-cultural method to the field of education are 
at least two: First and foremost it forces upon education the realization 
that there is a broad range of methods by which a child can be brought up. 
At the same time it shows that these different methods are neither chaotic 
nor “primitive” as some of the earlier reports of “strange happenings in 
the South Seas” would have indicated. Instead, child-rearing methods are 
functionally bound in a meaningful way to other parts of the life plan of 
the society. Polygamous societies, for example, and monogamous societies 
such as our own, have been shown to differ systematically in their methods 
of bringing up children. Further, these differences are not arbitrary or 
capricious but are consistent with findings derived from the current de- 
velopment of behavioral science which is the theoretical framework of the 
modern cross-cultural research. The second advantage to understanding the 
cross-cultural method is that above all else it causes us to be aware both of 
the virtues and of the limitations of the untrained observer, and much more 
important, it offers us ways of training people to look at the phenomena of 
behavior with a strategy of reason, logic, and objectivity. 

Such training and such perspective are important for education now as 
never before. Education in this country is committed not only to training 
the mind but essentially to socializing the person. The school faces prob- 
lems involving the emotions not only of the individual child, but also of 
groups of children as they come together. The field of education can no 
longer afford the luxury of being “culture bound.” It must accept, instead, 
the responsibility of knowing the best there is in research so that the new 
responsibilities before it may be accepted intelligently and thoughtfully. 
The authors are not suggesting, nor do they mean to imply, that the cross- 
cultural method is an answer to every research problem in education. They 
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are suggesting, instead, that by seeing the evolution of this particular 
method and the painful steps thru which it progressed, others now con- 
cerned with constructing a methodology adequate to the complexities of 
education may, by a review of what has been done in this area, avoid some 
mistakes thru knowing of the experiences of others. Finally, by seeing this 
progression, the educator may come to agree with Cumming and Cumming 
(8) that there is nothing more practical in an applied field than the 
possession of good theoretical research. 
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CHAPTER III 
Research Methods: Experimental Design 


JULIAN C. STANLEY 


Is us chapter three years ago, Kogan (41) illustrated a variety of ex- 
perimental designs by referring to 52 studies. My initial selected bibliog- 
raphy contained 214 items, the seemingly most pertinent 83 of which are 
reported here.’ I neither repeat any of Kogan’s references nor cover 


material summarized by Gardner (30) and Moses (54) in their excellent 
chapters. 


Current Trends 


Modern experimental design is less than half a century old. Its grand- 
father was William Gossett (“Student”) ; its very active father, Sir Ronald 
Fisher; its bible, Fisher’s The Design of Experiments (27). Recently two 
competitors, decision theory and information theory, have risen to chal- 
lenge the supremacy of the analysis of variance and covariance. Fisher 
(28:69) spoke out against “the attempt to reinterpret the common tests of 
significance used in scientific research as though they constituted some kind 
of acceptance procedure and led to ‘decisions’ in Wald’s sense. . . .” Pearson 
(59) and Neyman (55) replied to some of his criticisms. Cochran (15) 
stated that it would be unduly restrictive and harmful to view the function 
of statistics wholly in terms of decision making. 

In a technical paper, Lindley (45) suggested that altho undoubtedly one 
reason for experimenting is to reach decisions, another is to gain knowl- 
edge about the state of nature in the information-theory sense of Shannon. 
He introduced a measure of the information that an experiment provides 
and formulated a rule of experimentation: Perform that experiment for 
which the expected gain in information is greatest and continue experi- 
menting until a predesignated amount of information is attained. 

Garner and McGill (31) compared uncertainty analysis, following 
Shannon, with a two-way nonorthogonal analysis of variance and concluded 
that while the two are similar in many respects, uncertainty analysis should 
be used when the criterion variable has the properties of Stevens’ nominal 
or ordinal scale, while the analysis of variance must be employed when one 
desires to retain information about the metric and conditions for an in- 
terval or ratio scale are met. Frequently it may be desirable to do both 
analyses and compare the results. 

Box (5:975) recorded his opinion that “outside the field of agriculture 
the sequential situation is by far the most common one.” Altho the 
mathematical statistics of sequential experimentation have proved formid- 
able, Grundy, Healy, and Rees (32) presented a solution for a simple 


1A partially annotated mimeographed list of the other 131 may be obtained free from the author while 
his supply lasts. 
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version of the two-stage experiment: After the first experiment the re- 
searcher decides whether or not he needs to perform a second experiment 
whose extent depends upon the results of the first. Johnson (38) utilized 
sequential procedures to discriminate between two hypotheses about the 
ratio of variance components in a simple one-way classification. Ray (61) 
published tables for sequential tests applicable to the one-way classification 
and randomized blocks. 

In a clear, explicit, heuristic article, Box (4) proposed a method of 
process improvement to be run in the normal course of production by plant 
personnel themselves, whereby industrial processes regularly yield in- 
formation on how the product can be improved. Educators might be repaid 


amply for time spent pondering the applications of Box’s concepts to 
educational “products.” 


Fundamental Books 


Cochran and Cox (16) expanded their 1950 edition by 36 percent to 
cover recent developments. This expository handbook of designs is in- 
valuable to the experimenter who already has a year or so of background 
in statistics. Snedecor’s revision (67) should be quite helpful to educators 
who can translate agricultural examples into the jargon of their own areas. 
Davies (22) edited a large volume that, despite its title, has much relevant 
material; principles of experimental design are widely applicable. Likewise, 
the contributions of Bennett and Franklin (3), who devoted 280 pages to 
the analysis of variance and design of experiments, transcend the chemical 
industry. Perhaps Federer (24) tried to cover too much without presenting 
designs plans except in his examples, thereby making his treatment overly 
concise and difficult for most educational workers. Finney (25) wrote 
clearly but abbreviated excessively, providing little detailed help with 
computations. Ostle (57) included chapters on the analysis of variance 
and covariance and experimental design. A second edition of the Wishart 
and Sanders (82) manual appeared. 

Pearson and Hartley (60) tabled percentage points of the F distribution 
for the .25, .10, .05, .025, .01, .005, and .001 levels of significance and for 
the largest variance ratio, besides providing many other useful statistics. 


Applied Books 


McNemar (48) expanded his coverage of the design and analysis of 
experiments, especially with respect to statistical models. Cornell’s fresh 
new approach (19) merits adequate tryouts in one-year sequences; this 
book is a long step ahead of the usual textbook in educational statistics, but 
its sections on the analysis of variance demand supplementation along lines 
suggested by Stanley. A similar warning applies with even greater force to 
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Guilford’s revision (33). The chapter by Edwards (23) sets forth elemen- 
tary principles of experimental design. 

Brunswik’s “representative design” (9) bears a certain resemblance to 
recent work on random factors and variance components (20, 64, 81) and 
therefore might be pursued profitably in those terms. 


Primarily Expository Articles 


Baker (1) described a systematic method for arranging and analyzing 
the results of factorial experiments that gives the mean square relating 
to each degree of freedom. It is applicable either to qualitative or to 
quantitative factors and is especially suitable for use with a desk calcu- 
lator. Stanley (70) emphasized the importance of design, stressed random 
assignment, and outlined the planning of two classroom experiments. 
Cochran (14) commented about what may well have been the largest 
experiment ever conducted, listing important factors in the polio field trials. 
Stanley (73) showed in detail how to analyze scores from counterbalanced 
examinations, explaining Latin and Greco-Latin square crossover designs. 
Campbell (11) set forth clearly and systematically considerations of group 
control design, with special attention to the role of pretests. 


Four Basic Articles 


Four long, well-written, clarifying articles of great importance to ex- 
perimenters, tho probably not easy reading for most educators, deserve 
careful study. Scheffé (64) surveyed the current state of the theory of 
alternative models in the analysis of variance, showing (on page 259) 
expected values of mean squares for the mixed model with dependent inter- 
actions. His results agree with those of other recent investigators (20, 81) 
but not with certain earlier recommendations. McNemar (48:309), for 
instance, included in his expected mean square for the random effect in a 
mixed model an interaction component of variance that Scheffé omits, 
thereby leading the former to a more conservative test of significance for 
the random effect. 

Cornfield and Tukey (20) dealt with average (expected) values of mean 
squares for several types of factorial designs, stating that they are abso- 
lutely essential to the choice of an error term—tho not sufficient, of course 
—and found that the customary expected mean squares (64: 259; 81: 963) 
resulted from their derivations under very general assumptions. Thus it 
may make sense for Hoyt (35) and others to talk in terms of variance 
components (or rather, of intraclass correlation) for dichotomously scored 
test items. 

Wilk and Kempthorne (81) continued their contributions to the analysis 
of variance, defining experimental units as “those entities in an experiment 
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to which treatments are assigned at random” (page 951) and suggesting 
that the term be extended to include periods of time, states of mind, and 
other poorly defined complexes of conditions. They stressed the need for 
random sampling of “levels” of a factor when the experimenter wants to 
generalize to all levels, just as one samples randomly from a population of 
experimental units if he wants to generalize beyond the experimental units 
actually employed in the experiment. Wilk and Kempthorne showed how 
unit-treatment interactions enter into expectations of mean squares, point- 
ing out that if the number of experimental units in the population of ex- 
perimental units is large, the bias caused by such interactions will be small. 


We have long needed a thoro, authoritative treatment of the Latin square 
design. Wilk and Kempthorne (80) couched this in relatively easy 
language and symbolism, presented finite-model expectations of mean 
squares from which EMS’s for the other models can be derived quickly, 
and concluded that analyses of variance for Latin square designs may over- 
estimate the error term for treatment comparisons and underestimate the 
component of variance due to treatment main effects. Nevertheless, when 
both the Latin square and the randomized block designs are reasonable 
for a proposed experiment, they recommend the former, tho with caution, 
because its error term for the treatment mean square will usually be too 
large. 

Experimenters owe a debt of gratitude to Scheffé, Cornfield and Tukey, 
and Wilk and Kempthorne for the care they took to make these articles 
intelligible to the reader whose mathematics walks with a limp. 


Other Articles Concerning Expectations of Mean Squares 


In a note (71) and a review (19) Stanley listed and discussed expecta- 
tions of mean squares for finite, fixed, random, and mixed models, ex- 
plicitly for two- and three-way crossed classifications with equal numbers 
of replicates per treatment combination. Extensions to one and to four or 
more classifications can be made quickly on the basis of principles out- 
lined. Johnson and Stanley (39) exhibited the EMS’s for a mixed-model 
design involving two independent groups of boys, every one of whom re- 
sponded to 16 projective cards into which were incorporated three dichoto- 
mous factors, each level of which was represented twice: 2(2x 2x2) —= 
16. They showed how randomization at several points in their investigation 
was essential to the analysis employed. Medley, Mitzel, and Doi (49) pro- 
vided EMS’s for the three-way design without replication tho not via the 
finite-model EMS’s. 


Designs 


Technical journals, such as Biometrics, Biometrika, and Sankhyd, abound 
with papers extending old experimental designs and proposing new ones. 
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The interested reader is referred to these journals and to Cochran and Cox 
(16). Only a few articles of this type will be cited here. 


Zelen (83) explained at a reasonable elementary level his new method 
for analyzing data from incomplete block designs. Stanley (72) showed 
that Gellerman’s study was more complex than its author supposed, con- 
stituting a sort of split-plot crossover design, and explained how to analyze 
it. Stanley (69) used scores from two forms of a “satisfaction” inventory 
to compare crossover and noncrossover designs. Pearce (58) studied local 
versus remote effects of various treatments applied to different parts of the 
same organism. Morrison (53) illustrated several designs with at least five 
factors that make possible the testing of all main effects and two-factor 
interactions while requiring only half the number of observations of an 
analogous factorial design. Clarke (12) indicated how four 4 x 4 Greco- 
Latin squares might be used together to provide enough degrees of freedom 
for error. Stanley (73) dealt with completely permuted 3 x 3 Greco-Latin 
square designs. Collier and Stunkard (17) treated the same type of design 
as Stanley and Beeman (74) tho quite differently. 


Components of Variance, Pooling Procedures, and Power 


Tukey (77, 78) tackled variance components with a new mathematical 
procedure. Bulmer (10) gave a simple, reasonably accurate formula for 
the confidence limits of variance components. Searle (65) employed matrix 
methods to find sampling variances of estimates of components of variance 
and covariance for a one-way classification with unequal numbers of ob- 
servations in the various classes. King (40) recommended that in one-way 
classifications of a random factor, the number of levels of the factor equal 
the number of observations per level in order to give nearly maximum 
power for testing the null hypothesis. Johnson’s study (38) has already 
been cited. 

Huntsberger (36) showed that a certain weighting procedure provides 
greater control over disturbances that might result from pooling sums of 
squares to secure an error term with greater degrees of freedom than does 
the familiar sometimes-pool method. Bozivich, Bancroft, and Hartley (8) 
examined critically for some random and mixed models the consequences, 
with regard to resulting errors of the first and second kind, of certain pool- 
ing procedures. They provided two qualified recommendations, concluding 
that “no rule of the form V,/V,> constant is very satisfactory” (page 
1040). 

Nicholson’s formula (56) for the power of the analysis of variance test 
holds when the denominator of the F ratio has an even number of degrees 
of freedom. Fox’s extensive charts (29) with a detailed example should be 
useful. Commins (18) published a table stating the size of sample needed 
for various assumed values of population parameters and for four proba- 
bilities that the study will yield significant results. 
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A Posteriori Comparisons of Means 


Stanley (68) illustrated the “post-mortem” methods of Scheffé, Tukey, 
Dunnett, and Duncan for comparing various differences among means 
after an analysis of variance has been performed. Wallace (79) presented 
Tukey’s unpublished procedure, with applications and comments. Kramer 
(42, 43) extended Duncan’s multiple range test to include group means 
with unequal numbers of replications, adjusted means with heterogeneous 
variances and covariances, covariance analysis, incomplete block designs, 
lattices, and other situations. 


Applications of the Analysis of Variance to Tests 


Hoyt (35) generalized to test items not scored dichotomously his analy- 
sis of variance procedure for securing coefficients of equivalence. His result 
is algebraically equivalent to Cronbach’s alpha. Stanley commented on 
this procedure in his review (19). Moonan (50, 51, 52) showed how to 
ascertain the equivalence and stability of examinations and the interaction 
of items with methods in an experiment, using an orthogonal linear trans- 
formation due to Nandi. 


Nonparametric Approaches to the Analysis of Variance 


That current darling of psychologists, nonparametric statistics, is treated 
in Chapter VI of this issue, except as applied to the analysis of variance. 

Roy and Mitra (62: 374) attempted to make a clear distinction between 
a “variate” and a “way of classification” in order to differentiate between 
a “multivariate analysis” situation, an “analysis of variance” situation, and 
“something of a mixed type.” Hodges and Lehmann (34) found that the 
asymptotic Pitman efficiency of the Kruskal-Wallace rank test when com- 
pared with the F-test never falls below .864. They then investigated alterna- 
tive notions of asymptotic efficiency. 

Sutcliffe (75) showed how to partition sums of squares and associated 
degrees of freedom for complex contingency tables of frequency data from 
multiple classification designs, quite analogously to the analysis of vari- 
ance. It is interesting to compare this method with the Garner-McGill 
uncertainty analysis (31) based upon information theory. 

McNemar (47) analyzed seven sets of data both by the analysis of 
variance and Kellogg V. Wilson’s method, which is similar to Sutcliffe’s 
but less general, to show that the latter has considerably less power. One 
might add that with Wilson’s partitioning of chi-square the interaction 
term may be negative. 

The upshot of this’ seems to be that while nonparametric analogues of 
the analysis of variance are valuable for frequency data, one must be care- 
ful not to throw out the baby with the dirty bath water in the interests of 
simplifying computations. 
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The Analysis of Covariance 


In an important paper, Cox (21) compared covariance analysis with 
blocking (matching with respect to a covariable) and concluded that 
methods based upon covariance are preferable to blocking only if the r 
between the covariate (x) and the dependent variate (y) is at least .6. But 
if we suspect that the treatment effects are not independent of x—that there 
is a treatment by x interaction—we should ordinarily prefer to use x 
quantitatively. 

Truitt and Smith (76) examined methods for making covariance adjust- 
ments in split-plot experiments and testing main effects for significance. We 
have already mentioned the work of Kramer (42) and Searle (65). 


Pairing 

Jackson and Fleckenstein (37) compared the Thurstone-Mosteller, 
Scheffé, Bradley-Terry, and Gulliksen methods for analyzing data based 
upon paired comparisons, concluding that while all four procedures give 
about the same results, each has advantages for certain situations. 

For both the fixed and the mixed models, Lev and Kinder (44) offered 
analysis of variance formulas applicable to a group of several subjects 
observed in the presence of each of the other subjects of the group, the 
entire set of possible pairings having been repeated on several occasions. 
Runkel, Smith, and Newcomb (63) presented a method for computing the 
interaction effects on variables measured by observing interacting pairs of 
persons, where not all possible pairs of subjects need be observed. 


Miscellaneous 


In a long, technical article, Box and Hunter (7) continued the develop- 
ment of “Boxism,” introducing the concept of the. “variance function” for 
an experimental design and defining “rotable designs.” Cochran (13) 
discussed combining estimates from several experiments and gave ex- 
amples. 

Box and Andersen (6) found that while the analysis of variance test 
for groups of equal size is both remarkably “robust” (insensitive to ex- 
traneous factors not being tested) and “powerful” (sensitive to the specific 
factors being tested), Bartlett’s test for the homogeneity of a set of 
variances is affected drastically by departures from mesokurtosis. ‘For 20 
variances based upon 9 d.f. each, Bartlett’s test yields a significance level of 
.718 when kurtosis is 2, instead of the “correct” .05 value! For kurtosis of 
-1 the corresponding figure is .000004. The authors applied permutation 
theory to the problem of comparing variances to secure a robust test. 

Fisher (26) warned against choosing one transformation of data in 
preference to another because of computational ease without considering 
how well it conforms to theoretical considerations. 
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Smith (66) stepped in to resolve the long-standing discussion in Biomet- 
rics about whether the missing plot estimate should be considered simply a 
number to be placed in the empty space or an estimate of the lost observa- 
tion. He pointed out that the standard error to be attached to the estimate 
depends upon what one intended a priori to estimate. 

Articles on graphic methods by Barnes, Pearson, and Reiss (2) and 
Lyle (46) are well worth perusing. 


Concluding Remarks 


Many of the contributions to experimental design during the past three 
years should be incorporated rapidly into statistics textbooks designed for 
students in education and psychology. Authors of such books need the 
ability and willingness to translate into simpler but still accurate form 
relevant material published by mathematical statisticians. Then by study- 
ing for at least a year, and preferably longer, under a well-qualified in- 
structor, graduate students may come to understand the rudiments of ex- 
perimental design. To do less than this and still hope for properly designed 
experiments is asking for a miracle. 
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CHAPTER IV 
Research Methods: Status Studies and Sample Surveys 


ROSEDITH SITGREAVES and HERBERT SOLOMON 


Tue many technical articles published in recent years on the theory and 
methodology of sample surveys and the rise in the number of status 
studies demonstrate the need for a separate chapter on these subjects. In 
this connection, it is appropriate to mention that Cornell (16) also pre- 
pared a separate chapter on this increasingly important area in a previous 
Review. Altho, as indicated in the earlier chapter, the theory and practice 
of sample surveys developed in other fields, education will obviously be 
one of the biggest users of the method as a research tool. For example, the 
recent large foundation grants for research in almost all aspects of educa- 
tion usually demand status studies to determine the present position of 
education in our culture and so provide a basis for comparison and future 
evaluation. 

The major impetus for status studies, and the consequent attention to 
technics of sample surveys, has been and still is the need for vital statistics 
which can be used for social research and the determination of public pol- 
icy. The report on the evaluation of Salk polio vaccine (96), the earlier 
Kinsey report, and the ever current election straw polls have made the 
general newspaper-reading public aware of sample survey technics and 
have indicated the growing need for more statistical sophistication in stu- 
dents in education and social research. It is interesting to note that the 
sampling design used in the 1954 field trial evaluation of Salk polio vac- 
cine contained flaws similar to those reported by Student (76) in his assess- 
ment of the sampling design for the Lanarkshire milk experiment of 1930 
(a large-scale status study to contrast the effects of raw milk and pas- 
teurized milk on the heights and weights of elementary-school children in 
Scotland). An evaluation of the design of the polio vaccine field trials was 
presented by Brownlee (9), and sampling problems in the Kinsey reports 
were discussed (37). A rather recent use of the results of sample surveys 
has been their introduction as legal evidence in judicial decisions. Deming 
(21) discussed some of the problems involved in these situations. 

For purposes of exposition, status studies can be artificially classified into 
two divisions according to goals. One goal can be illustrated by the opera- 
tions of United States Government agencies in doing status studies on 
many national social and economic characteristics for the purpose of col- 
‘Tecting and publishing vital statistics. The results of these status studies are 
then available for government, industry, labor, educational groups, and 
others to be used as aids in policy making. The fact that some of these 
studies have become routine operations in no way indicates that the asso- 
ciated sampling design and nonsampling problems have been conquered. A 
second purpose of status studies is best illustrated by the aforementioned 
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Salk vaccine evaluation trials. Here a specific goal is in mind before the 
experimentation begins, and the design is prepared for this one effort. 

No matter which goal prevails, the methodological problems are basi- 
cally the same. Usually some nonsampling restrictions to fit a specific situa- 
tion are placed on a random sampling mechanism. The most elementary 
situation of this type is known as stratified random sampling. Other de- 
signs, such as systematic sampling, cluster sampling, and multistage sam- 
pling, are now described and developed in several texts. Some of these 
texts by Sukhatme (78) and Hyman (33) have appeared. Chapters on 
sample design and analysis can also be found in compendiums of technics 
for social research such as that by Young (100). 

A number of papers on theoretical design and assessment have been 
published and are discussed“below. Much of the emphasis in these papers 
has been on the preparation of a sampling design which minimizes the 
variability of an estimate, or the cost of a survey, or both. In addition, 
papers concerned with sampling problems in “dynamic studies,” that is, 
measurements of a trait over time, and papers concerned with measure- 
ment problems related to responses to questionnaires and inventories, are 
beginning to appear. In the main, the present chapter reports on papers 
published since the middle of 1954, but it also contains some earlier ar- 
ticles of interest. 


Status Studies 


The U. S. Bureau of the Census undoubtedly leads the way in the ap- 
plication of sample survey technics to status studies in education. Four 
regularly reported measurements of interest to educational researchers are 
school enrolment (88, 89, 90, 91, 92), employment of students (83, 84, 
85), summary of government finances (93, 94, 95), and school districts in 
the United States (86, 87). As one can notice and expect, these status 
studies represent the collection of vital statistics rather than single studies 
motivated by specific educational problems. 

The use of sample surveys to explore specific educational issues is in- 
creasing. Holland (31) made a survey of approximately 700 freshmen and 
sophomores at Michigan State College to assess their perceptions of the 
instructor. A stratified random sample of approximately 1000 eighth- 
graders in Kansas schools was collected by Zack (101) to determine the 
influence of socio-cultural characteristics on educational opportunities in 
public-school instrumental music. Another stratified random sample of 87 
fifth-grade teachers from public elementary schools in Minnesota was used 
by Johnston (34) to determine the achievement of objectives of elemen- 
tary-school science. A survey to obtain information for teaching materials 
for food and nutrition classes was made by Thrift and Ward (80). Inter- 
est in the impact of social stratification on occupational expectations of 
twelfth-grade Michigan boys led Youmans (99) to a sample of 6800 youths 
from 56 public and private high schools. Hunter (32) studied the attitudes 
of public-school teachers in a large Southern city toward school and living 
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conditions thru a questionnaire submitted to a population of about 2000. 
Tumin (82) used a stratified sampling design involving cluster sampling 
to study the effect of the exposure to mass mediums of communication on 
readiness for desegregation among white males 18 years old and older in 
Guilford County, North Carolina. 

Two studies of a noneducational. nature, which are of interest from the 
standpoint of sampling procedures, were concerned with transportation 
flow data (74) and the estimation of the Brazilian coffee crop (75). 


Theoretical Design and Assessment of Sampling Errors 


There is obviously a close relation between the development and assess- 
ment of sampling designs and the properties of various estimates calculated 
from the resulting sample data. Indeed, the efficiency of any sampling plan 
is usually assessed in terms of the variances of the resulting estimates. For 
purposes of discussion, however, an attempt has been made to divide the 
references in the broad area of theoretical design and assessment into two 
groups depending upon whether the primary emphasis of the paper is on 
the sampling design or on the proposed method of estimation. It is believed 
that such a division, altho slightly arbitrary in some instances, will be 
useful in highlighting recent developments in this area. 


Sampling Design 


Two expository papers presented the basic notions of population sam- 
pling in nonmathematical terms. Slonim (73) covered briefly but clearly 
the concepts of simple random sampling; he also discussed estimation 
procedures and sampling and nonsampling errors. Jones (35) discussed 
the meaning and purpose of sampling, the relative merits of different 
sampling procedures, and methods of minimizing the costs of random 
sampling. Both authors drew illustrations from their respective fields, 
namely, the Air Force and the telephone industry, and the papers should 
prove generally useful in providing insight into sampling theory and 
procedure. 

Expository discussions of some specific sampling problems were given 
by Deming (20) and Dalenius (18). Deming presented a simplified pro- 
cedure for the selection of a sample and for the numerical computation 
of the standard errorstfrom the returns. Dalenius discussed the various 
methods proposed for determining sample sizes in stratified random 
sampling when the survey is designed to provide information on more 
than one variable. 


A general discussion of stratified random sampling was given by 
Aoyama (1). The author considered, among other things, the selection 
of controls for stratification, the influence of stratification on the estimate, 
and a method of analysis of data based on a modification of Tchebycheff’s 
inequality. 


462 





December 1957 Status STUDIES AND SAMPLE SURVEYS 





In a paper on two-stage sampling, Sen, Anderson, and Finkner (71) 
reported on an empirical investigation of various stratified two-stage 
sampling systems for estimating totals of certain agricultural items in 
North Carolina. The investigation represented the application of theory 
developed by Sen to the selection of two primary sampling units without 
replacement from a stratum where one of the units is selected with prob- 
ability proportional to size and the other with equal probability. 


The problem of planning a two-stage sample involving multiple cor- 
related characters was considered by Chakravarti (11). A model was form- 
ulated for the problem, and three procedures for determining optimal 
sample sizes were given, depending upon particular conditions to be opti- 
mized. A numerical illustration was also given. 

In other papers on two-stage sampling, Brooks (6) considered the es- 
timation of an optimum subsampling number when the ratio of the vari- 
ances within primary units is not known but must be estimated. Rangaga- 
jan (61) compared two methods of selecting second-stage units from 
primary units, one in which the number of second-stage units is fixed in 


advance and a second in which the expected number of such units is fixed 
in advance. 


Multistage sampling plans were discussed by Banerjee (2), Basu (3), 
Cansado (10), Raj (57), and Roy (65). 


An extension of present sampling theory to the problem of sampling 
over time was considered by Eckler (22). The problem of interest is 
that of estimating the time-dependent mean of a population. In such a 
case information contained in earlier samples may be used to improve the 
current estimates, provided the various samples have some elements in 
common. A plan for sampling over time such that some old elements 
are eliminated and new elements are added each time a sample is drawn 
is called rotation sampling. Three methods of rotation sampling were 
described in the paper and compared on a cost basis. 

Among papers on special sampling designs was one by Patterson (54) 
which compared four methods of selecting a lattice sample. Krishna lyer 
and Singh (41) considered distance travelled as one factor in a lattice 
design. Sen (70) investigated a multivariate sampling design in which 
successive observations were not independent. Raj (58) studied the selec- 
tion of two overlapping samples for multipurpose surveys. 

Papers by a number of authors dealt with the choice of sample sizes. 
Yanedo (98) gave a rule for choosing in stratified sampling between 
sample sizes proportional to stratum sizes and sample sizes proportional 
to stratum sizes weighted by estimates of the stratum standard deviations. 
Optimum allocation, that is, the allocation minimizing the variance of 
the required estimate, for a successive sampling plan involving correlated 
variables was considered by Tikkiwal (81). Putter (55) considered an 
optimal linear decision rule for allocating the sample sizes in the second 
stage of sampling from a stratified normal population. Grundy (29) 
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discussed a method of stratified sampling with probability exactly pro- 
portional to stratum size. 

In a comparison of sampling with and without replacement, Kozniewska 
(40) concluded that sampling without replacement was more efficient for 
unstratified random sampling. Rios (62) demonstrated that the expected 
number of distinct elements in sampling with replacement is greater than 
or equal to the number in sampling without replacement when the vari- 
ances of the corresponding averages are equated. Singh (72) stated that 
in a stratified design, the selection of primary units without replacement 
is usually more efficient when the number of primary units is two, but is 
not necessarily so in other situations. 


In investigating area sampling in agricultural problems, Mokashi (47) 
compared four types of sampling designs for estimation of timber volume 
per acre. A cluster sampling plan for estimating crop acreage was also 
considered by this author (49). 

Moser (50) described the principal developments in the sampling of 
human populations in Great Britain during the past five years, discussing 
changes in methodology together with new applications. Zarkovic (103) 
discussed sampling methods in the Yugoslav 1953 population census. 


Among the other papers listed, Chapman (12, 13) and Chapman and 
Junge (14) considered probability models and sampling methods for 
biological populations which are often mobile in space and difficult of 
access. Sundrum (79) discussed a method of systematic sampling based 
on ordered properties. Matthai (46) considered the selection of random 
numbers for large-scale sampling. Jones (36) discussed the use of random 
subsample means to evaluate variability and possible bias in samples. 
Rios (63) considered several problems of maximums and minimums in 
sampling from a finite population. Kitagawa (39) discussed some prob- 
lems in survey design. 


Methods of Estimation 


The problem of estimating the total value of a character in a finite 
population on the basis of a sample was considered by several writers. 
Raj (60) proposed unbiased estimates when sampling units are selected 
with varying probabilities without replacements, and gave exact expres- 
sions and unbiased estimates for the corresponding variances. The same 
author (59) also studied the properties of ratio estimates in sampling with 
equal and unequal probabilities. Ronge (64) compared ratio and linear 
estimates of a population total when data on the value of the variable 
are known for an earlier time. 

Das (19) considered the estimation of a population total and its vari- 
ance in a finite population when the estimate was based either on a 
specified type of two-stage sampling or on sampling with varying prob- 
abilities. Sen (68) also discussed the estimation of the variance in a 
finite population. 
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More general types of linear estimates were studied by Godambe (27) 
and Raj (56). Three papers of Masuyama (42, 44, 45) concerned esti- 
mating a total area on the basis of areal samples. 

A general procedure for constructing unbiased estimates of the mean 
value of a variate in a finite population for a specified two-stage sampling 
design was given by Sandelius (67). A procedure was also given for 
providing unbiased estimates of the variances of the estimates. A minimum 
variance unbiased estimate of the mean of a given group when samples 
are also available from correlated variables in other groups, was con- 
structed by Narain (52). Properties of a sample mean were studied by 
Bennett (5). 

Studies of the variance of estimates were made for ratio estimates in 
stratified sampling by Mokashi (48), for unbiased estimates in cluster 
sampling by Yamamoto (97), and for Gini’s mean difference in samples 
from a finite population by Salvemini (66). An outline of a general theory 
for estimating variability among strata for a specified sampling scheme 
was given by Sen (69). 

Technics for estimating the intercensal population of counties were given 
by Brown (7) and Crosetti and Schmitt (17). A method of adjusting 
census estimates to agree with other available census data was discussed 
by El-Badry and Stephan (24). 


Nonsampling Errors 


Initial emphasis in sample survey technics was on sampling designs 
which would yield unbiased estimates of the desired population values. 
These designs, over the years, have been refined and elaborated to take 
advantage of possible patterns of variability in the population. Most of 
these designs, however, began with the assumption that once an individual 
was selected for the sample, the desired information was obtainable in 
an accurate and reliable form. 

It was early recognized, however, that problems of nonresponse and 
accuracy of reported data existed. In fact, in discussions of the usefulness 
of sample surveys as opposed to complete enumeration, an argument fre- 
quently advanced was that estimates based on sample data collected by 
a small number of trained workers were preferable to counts based on 
complete enumeration carried out by a large number of untrained people. 

Discussions of nonsampling errors are now appearing more frequently 
in the literature. Sukhatme (77), for example, discussed the measurement 
of observational errors in surveys. A sampling procedure to deal with the 
problem of nonresponse in mailed questionnaires was developed by El- 
Badry (23). The effects of nonresponse were also considered by Brownlee 
(8) and Cohen and Lipstein (15). 

In some instances, the accuracy of reported information is checked by 
taking a second sample from the original data and collecting more detailed 
information for the individuals in the second sample. Thus, Kish and 
Lansing (38) reported on discrepancies in the estimates of market values 
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of homes made by the home owners and by professional appraisers. 
Zarkovic (102) discussed the use of sampling methods to evaluate the 
accuracy of literacy data. The validation of morbidity survey data by 
comparison with hospital records was studied by Belloc (4). 

The effect of memory on reported responses was investigated by Gray 
(28) using data from the British Survey of Sickness. An attempt to 
measure errors due to editing of questionnaires in a census was made by 
Nordbotten (53). The effect of ignorance on opinions of economic and 
social issues was considered by Ferber (25). Ferber (26) also investi- 
gated the consistency of replies of various family members. The error 
in crop-cutting experiments due to the bias on the border of the grid was 
discussed by Masuyama (43). 

Hansen and others (30) reported on a redesign of the Current Popula- 
tion Survey to provide for a more efficient system of field organization 
and supervision as well as on some advances in methods. 

A study by Myers (51) of the accuracy of age reporting in the United 
States concluded that accuracy has improved over the past 70 years par- 
ticularly for native-born white males. He stated, however, that there is 
need for improvement in the nonwhite population, and that in general, 
reporting of ages by women is significantly less accurate than such report- 
ing by men. 
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CHAPTER V 


Research Tools: Library Resources 


LOLA R. PIERSTORFF 


Tus description of library resources and bibliographical technics brings 
up to date the similar chapter by Good (14) in the December 1951] issue 
of the Review, covering the materials published since June 1951. The 
topics treated include (a) library services, manuals, and general aids; 
(b) guides to books and periodical literature; (c) guides to theses and 
selected research projects; (d) serial and occasional bibliographies and 
summaries; and (e) institutional directories or handbooks.* 


Library Services, Manuals, and General Aids 


Brickman (5) again reviewed the chief reference works in education. 
Barton (3) prepared another revision of her brief guide to reference 
books. Shores (33) gave a detailed discussion of basic reference sources 
with sections on references by type and subjectmatter area. Two supple- 
ments to Winchell and Johnson’s bibliographic guide to general reference 
books (44) were completed. 

A research methods bibliography has been a part of Research Studies 
in Education (7) since the 1953 issue. Seeger (32) gave very practical 
suggestions on the use of library resources in educational research. 

Kinney (20) produced a guide to bibliographical style manuals and their 
use in documentation and research. Campbell (9), Dugdale (12), and 
Turabian (38) revised their manuals for writers; these publications deal 
chiefly with style. McCrum and Jones (24) wrote a manual devoted to 
bibliographical procedures and style used in the Library of Congress. 


Guides to Books and Periodical Literature 


The guides to books included the annual selection of outstanding edu- 
cational books (10, 29) and books in education considered significant 
for an eight-year period (45). 


Educational Measurement 


The Fourth Mental Measurements Yearbook (8) remained the most 
nearly complete listing of tests and their evaluations; it also listed and 
reviewed books in this field. It covered the period 1948 thru 1951. 


* A mimeographed bibliography of references additional to those given in this chapter is available 
from the author while the supply lasts. 
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Psychology 


Six more volumes of the Annual Review of Psychology (1) were issued. 
While the contents varied slightly from volume to volume, there regularly 
were reviews of areas interesting to educational workers, for example, 
learning, counseling, statistical research and design, educational psychol- 
ogy, and child psychology. A new book-review journal (11) with author, 
reviewer, title, and subject index of selected books was started. Latham 
(22) reviewed guides to literature in psychology. 


Textbooks 


Textbooks in Print (35), formerly American Educational Catalog, is 
a compilation of all textbooks, with excellent indexes by author, title, 
and subject. 


Periodicals 


Tangible evidence of the importance of periodical literature was pro- 
vided in a study by Saunders (31) for UNESCO. For example, he found 
that the total number of journals referred to in the Encyclopedia of Edu- 
cational Research approached 400. He concluded that the indexing of 
periodical literature in the social sciences and humanities was not as 
satisfactory as that in the natural sciences and that altho education was 
one of the best served social sciences, there was considerable room for 
improvement. Another UNESCO publication (39) listed educational re- 
search journals in 44 countries. 


Guides to Theses and Selected Research Projects 


Beginning in 1956, the Trotier-Harman yearly index of doctoral dis- 
sertations (37) was consolidated with Dissertation Abstracts (26). The 
latter then became the standard annual comprehensive list of doctor’s 
dissertations. 

Periodic compilations of titles of dissertations, theses, reports, and field 
studies in education were begun. Blackwell (4) listed research in education 
and educational psychology presented for higher degrees in the United 
Kingdom and Ireland beginning in 1918. Brown, Lyda, and Good (7) 
began an annual listing of doctoral dissertations completed and under 
way in education together with a research methods bibliography for the 
year. The first section, altho possibly more helpfully arranged, substan- 
tially duplicated Trotier and Harman (37) and later, the corresponding 
section of Dissertation Abstracts. The second section continued the re- 
ports originated in the Phi Delta Kappan (17); Good’s bibliographies 
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are helpful. Lamke and Silvey (21) began an annual classified listing of 
titles of master’s theses in education presented for degrees in the United 
States and Canada beginning in 1951-52. The compilers expressed the 
hope that making such research generally known might help give it 
stature. 

Indexes and abstracts of foreign physical education literature were 
made available beginning in 1955 (18). The Clearing House for Research 
in Child Life changed to semiannual publication (42) and no longer lists 
certain kinds of medical studies. UNESCO (39) listed educational re- 
search bibliographies and directories in 44 countries. Bibliographies of 
dissertations in fields related to education are easily located in the well- 
known Education Index; therefore they will not be listed here. 


Serial and Occasional Bibliographies and Summaries 


Other continuing bibliographies or summaries included guidance (16), 
occupational literature (13), audio-visual education (23), history of edu- 
cation (6, 19, 43), psychiatric books (25), reading (15, 36), teacher 
education (2), articles on education in lay magazines (27), criticisms of 
education (34), and questionnaire studies in education (28). 


Institutional Directories 


New education directories or handbooks were started on the international 
(30, 40, 41) and the national (46) levels. 


Summary 


This chapter has been devoted to technics and procedures for work 
with reference tools in the library. Research theory has always been 10 
to 20 years ahead of practice for many reasons. The development of action 
research, study councils, a variety of new reference tools, and summaries 
of research have been particularly valuable in stimulating application. 
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CHAPTER VI 


Research Tools: Observing and Recording 
Group Behavior 


MARVIN TAYLOR and HAROLD E. MITZEL 


Dauruc the period covered by this Review, discussion and experimental 
activity concerned with group behavior continued at a vigorous rate. 
Altho the years cannot be characterized as encompassing great originality 
in methodological construction, it seems safe to argue that they have been 
dominated by continuous refinements of useful observational methods 
and by increased knowledge of variables which act to bias and distort the 
observations themselves. A noticeable characteristic has been the con- 
tinued use of observational and recording technics for such practical pur- 
poses as assessment and selection programs. 

This review of the literature represents only a sample of the numerous 
research studies in the area. The authors have tried to limit their study 
to experimentation which more or less directly involves in its design the 
exploration and/or refinement of technics of observing and recording 
group behavior. It should also be noted that the total literature on this 
topic is not represented because of the exclusion of studies reported in 
languages other than English. The authors regret their lack of access to 
material from this expanding source. 


Measurement of Group-Membership Interaction 


During the period under examination, energy was invested in locating 
and studying variables which affect group-member interaction. Jensen 
(30) suggested a seven-faceted conceptual framework for observing both 
the social structure and the interaction within classroom groups. The 
dimensions for study were (a) problem solving, (b) authority-leadership, 
(c) power, (d) friendship, (e) personal prestige, (f) sex, and (g) priv- 
ileges. The remainder of this review will deal with the various approaches 
to these areas. 

At Ohio State University a 10-year period of interdisciplinary research 
was described in a series of monographs. One report in this series was 
by Stodgill and Shartle (51) who studied administrative and leadership 
relationships within an established, hierarchal organization. They de- 
scribed the development of an interview technic which yields sociometric- 
type data, and a set of scales to measure the leader’s perception of his re- 
sponsibilities, authority, and delegations (RAD scales). Several other 
technics which were developed will be discussed in a later section. In a 
second monograph, Fleishman and others (17) produced statistics relevant 
to a scale called the Leadership Behavior Description Questionnaire 
(LBDQ). This scale was used in an industrial setting by foremen and 
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their subordinates and was useful in predicting and describing the re- 
lationship between these individuals. Halpin (22) described the LBDQ 
as a two-dimensional scale, containing 40 items selected from a factor- 
analysis matrix. It was designed to measure the behavior of a leader in 
terms of how he “initiates structure in interaction” and how much “con- 
sideration” he exhibits. Halpin further pointed out that the scale had been 
used fruitfully in measuring the effectiveness of leaders in military, in- 
dustrial, and educational settings. Hemphill (27), at one time associated 
with this project, utilized five or six scales in a study of the relationship 
between behavior of college department heads and the reputation of the 
departments for being well administered. In addition to the RAD and 
LBDQ scales already described, of particular interest is the Group Di- 
mensions Description Questionnaire. In his manual Hemphill (26) stated 
that this instrument was designed to elicit a respondent’s perceptions, 
attitudes, and feelings about his group. By combining every group mem- 
ber’s scores it was also possible to obtain a profile of the major dimen- 
sions which characterized the group as seen by its members. Hemphill 
stated that the scale samples 13 dimensions of group structure and 
reported data to support claims for reliability and validity. 

Shevitz (48) investigated Hemphill’s hypotheses regarding leadership 
origins in three-man groups by obtaining data from two trained observers. 
The observers utilized behavior categorized according to a scheme worked 
out by Hemphill and the Ohio State group. Olmsted (44) developed a 
22-item fixed-response questionnaire to evaluate the adequacy of leader- 
ship of the formal group leader. Of particular interest may be the measure 
of “leader favorableness” which was claimed to be independent of the 
usual halo effect found in such measures. 

A more recent development was the emphasis on peer- and self-ratings. 
Webb (59) studied the relationship between self-ratings and objective 
measures of intelligence. The rating device used was very simple, requiring 
subjects to rank their peers and themselves into a normalized scale from 
most to least intelligent. The results prompted Webb (58) to develop a 
novel form of self-group ratings which would yield high reliabilities for 
self-measures. In this technic, “Self-plus-minus,” the individual compares 
himself with every other member of his group on a particular trait. The 
result is a greater number of self-ratings and higher reliabilities. Mayo 
(38) studied the relationship between peer-ratings and halo effect and 
reported that not all of the variance found was attributable to halo effect. 
Suci, Vallance, and Glickman (52) found that peer-rating reliabilities were 
not markedly affected by variation in the objective basis of the choice, or 
by the rater’s liking certain members of his group. Hoffman and Rohrer 
(28) developed a peer-evaluation scale. The score on this scale may be 
generalized to groups outside the reference group from which the score 
was obtained. Buchheimer and Pendleton (7) studied the Group Partici- 
pation Scale originally devised by Pepinsky, Siegel, and Van Atta and 
tentatively concluded that it is reliable and valid. On this instrument, 
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group participants rate each other on such behaviors as “initiate, sustain, 
define, and direct activity toward goals which were held by the group.” 
In summary, the work in this area is promising and encouraging. The 
technic appears to be quite useful for obtaining intimate data about intra- 
group relations which are not easily accessible to the observer’s eye or 
to other forms of paper-and-pencil tests. 


Sociometric Instruments 


An area of considerable activity lies in the development and refinement 
of sociometric-type instruments. Marshall (35) reviewed previous at- 
tempts to study the relationship between sociometric choices of preschool- 
age children and several criteria of social behavior. She concluded that 
the use of antiquated statistical procedures and methods of investigation 
had not yielded fruitful results. Acting on this conviction, McCandless 
and Marshall (33) constructed a picture sociometric test for use with pre- 
school-age children, utilizing large photographs of children in the same 
group and several oral sociometric-type questions. In a later study (36) 
they investigated the relationship between choices of friends and such 
variables (observed by a group of sophisticated judges in two-minute 
segments) as associative play, friendly approach, conversation, hostile 
interaction, attention, and no response. The outcomes suggest that this 
may be a valuable technic. 


In an approach distinguished by its originality, Gardner and Thompson 
(19) described the development of five social relations instruments. In 
constructing their scales, which yield near normal distributions, the au- 
thors tried to take into account such factors as (a) the ambiguity of 
needs underlying the choice, (b) the inequality of rankings by means of 
a nominations approach, and (c) the lack of generalizability of data from 
one group to another. By manipulating the obtained data, eight indexes 
of an individual’s social relations status in a group and nine indexes of 
social group structure are calculable. 

Not all the activity in this field was in the development of new instru- 
ments. Mouton, Blake, and Fruchter (42, 43) analyzed 53 studies con- 
ducted in military, industrial, and educational settings and concluded 
that the sociometric-type test has considerable reliability and validity. 
They also suggested some of the variables which affect data and com- 
mented on additional uses for the obtained information. Davitz (14) 
studied the relations between sociometric choice and perceived similarity 
and dissimilarity. Of particular interest to other investigators should be 
the manipulation of sociometric instruments so as to measure perceptions 
of similarity and dissimilarity. Vidich and Shapiro (56) in a study using 
a large sample found that measures of prestige as revealed by a socio- 
metric-type questionnaire and an anthropological field-worker’s ratings 
were complementary but not overlapping. 
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The number of choices to be allotted a subject, another area of concern 
in the construction of sociometric tests, was investigated by Gronlund 
(20). He was interested in the stability of weighted and unweighted scores 
based on three, four, or five choices. Tagiuri, Bruner, and Kogan (53) 
developed a mathematical model for computing the chance frequency and 
variance of the dyadic relationship obtained within relational analysis. 
Keislar (31) constructed a special scoring formula for the “Guess-Who” 
type questionnaire which is claimed to be more valid than and just as 
reliable as “older” technics. The special advantages of the “new” formula 
lie in the acquisition of a normal distribution and the minimizing of the 
effect of unequal familiarity of all members in a large group. 

Barr (4), Luebke (32), and Hale (21) discussed ways of analyzing 
and charting or mapping the results of sociometric tests. Hale’s particular 
contribution was the development of five criteria, derived from current 
group dynamics literature, which might be used to measure the social 
growth of a group over a period of time as revealed by test-retest socio- 
metric data. 


Direct Observational Technics 


A more time-consuming technic for studying group-member interaction 
is the direct observation of a group in some artificial or natural setting. 


This form of obtaining information requires large amounts of time because 
of the training of judges or raters and because of the number of raters 
used being usually larger than is the case with sociometric or peer-group 
ratings. As one might expect, however, direct observation is particularly 
effective in investigations of certain kinds, for example, communications, 
problem solving, and the like. Bales (2) designed a 12-category interaction 
scale which allowed the trained observer to capture the flow of interactions 
within a group. Altho the original scale was completed prior to 1954, the 
article reviewed is particularly useful for the novice or the lay individual 
interested in dynamics of small-group interaction. Withall (62) described 
a technic for obtaining a measure of the teacher’s classroom interactions 
which eliminated the use of many judges, but on the other hand, probably 
entailed a large expenditure of money for equipment. Altho no evidence 
was presented to show that the technic used was reliable or valid, the use 
of a time-lapse camera clicking pictures every 15 seconds and a sound tape 
recording must be considered an ingenious way to obtain information 
about teacher-pupil interactions. Moustakas, Sigel, and Schalock (41) 
developed an observation schedule which has 82 units for describing child 
behavior and 89 units for describing adult behavior. The categories were 
designed to measure overt behavior and none was constructed to be inter- 
pretive or evaluative. The authors claimed that observer and category re- 
liability were high. 
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Most of the experimentation in group dynamics is usually designed to 
include some combination of observational and paper-and-pencil technics. 
The concluding pieces of research to be reviewed will be considered ac- 
cording to the problem area they represent. 


Communications 


The most frequently utilized technic in the study of communications 
is direct observation. Hearn (24) used the Bales categories to determine 
the direction of remarks between members of the group in leaderless and 
in trainer-dominated sessions. Porter (47) studied the relationship between 
the type of participation in a small-group discussion, as measured by the 
Bales categories, and feelings of satisfaction. Cervin (10) predicted from 
a stimulus-response-type model that a person of high emotional respon- 
siveness would speak first, participate more, and change his opinions less 
than a person of low emotional responsiveness. Emotional responsiveness, 
as measured on a Guttman type paper-and-pencil technic, and observation 
of the group supported the original predictions. 


Group Problem Solving 


The comparison of the effectiveness of group versus individual prob- 
lem solving, the development of criteria for measuring effectiveness, and 
the process of group problem solving were of considerable interest to in- 
vestigators. Dickens (15) devised a formula based on the hypothesis that 
one aspect of effectiveness is the spread of participation among group 
members. McCurdy and Eber (34) defined effectiveness on a three-man 
light-switch series task in such behavioral terms as (a) time per unit of 
work, (b) correct switch turnings per unit of work, (c) errors per unit of 
work, and (d) errors per unit of time. These criteria seem to be a success- 
ful attempt to locate facets of a situation which may be objectively ob- 
served and recorded. Torrance (55) had members of bomber crews re- 
spond to projective-type pictures construed to be “psyche-group” and 
“socio-group” oriented. He studied the relationship between a group’s 
perceptions of its functioning and its actual performance according to 
military criteria. Fiedler (16) studied the relationship between a basketball 
team’s effectiveness and the assumption of similarity among its members. 
The measure of assumed similarity was derived by analyzing the responses 
that the team members made for each other on a questionnaire. The merit 
of these methods apparently lies in the depth of analysis they permit the 
investigator. 

Hays and Bush (23) employed a mathematical model to predict group 
decision making. Their models took into account two types of group action: 
the group-actor model where individuals decide together, and the voter 
model where the individuals’ choices are independent and the decision is 
by majority vote. Damrin (13) developed a unique problem-solving test 
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under the sponsorship of the Russell Sage Foundation. This test involved 
the use of a set of 36 interlocking blocks in different colors. Each member 
of the group was given one or two blocks and the group was then instructed 
to make a plan for reproducing a model figure constructed from another 
set of blocks shown them by the test administrator. The test was success- 
fully used from the third-grade level to adult groups, and wide variations 
in the quality of group performances were noted at every level. Method- 
ological problems connected with the technic include (a) quantification 
of group performance from observers’ protocols, (b) reduction of bias 
intreduced by personality of test administrator, and (c) the effects of group 
size on performance. The Russell Sage Social Relations Test will un- 
doubtedly be refined and modified to become an important measure of 
group performance. Even without these refinements in its present stage 
of development the RSSR Test provides valuable demonstration material 
for workshops interested in cooperative educational goals. 


Conformity 


An area of increasing concern to social psychologists is conformity be- 
havior. Matthews and Bendig (37) formulated a statistic, the “Index of 
Agreement,” which quantifies the amount of agreement between group 
discussants. Simon and Guetzkow (49) devised a mathematical model to 
examine Festinger’s five hypotheses relevant to group pressure, communi- 
cations, and movement toward uniformity of opinion. Five equations were 
developed which take into account a correction for time and a variable 
designated as feedback. This particular construct is very useful in explain- 
ing much of the variance in Festinger’s and in his colleagues’ experiments. 
Blake and Brehm (6) in a novel project demonstrated how a tape record- 
ing of a simulated group could produce conformity on the autokinetic 
perceptions of an uninitiated group member. Burdick (8) also produced 
conformance of opinion about juvenile delinquency by using a simulated 
group discussion. The methodology of these two studies and their results 
seem to suggest a very economical way to study group pressure effects and 
perhaps other group effects upon individual behavior. Crutchfield (12) 
described a laboratory-type arrangement for studying conformity behavior. 
His “quasi group-interaction method” serves to standardize the situation; 
hence the stimuli for each subject are identical. 


Social-Emotional Climate 


Forlano and Wrightstone (18) revised the Ohio Social Acceptance Scale, 
a peer-rating instrument, to assess the quality of social acceptance in a 
classroom. Social acceptance was operationally defined as the percentage 
difference between the median acceptance ratings and rejection ratings 
within a class. The results of their study suggest a marked use for this 
measure. Wandt and Ostreicher (57) devised 14 scales, each containing a 
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nine-step interval to measure the consistency of teacher behavior in the 
area of social-emotional climate. Gardner and Thompson (19) developed 
several technics for measuring “esprit de corps” and “group effectiveness” 
components of morale. 

Phillips and D’Amico (46) purported to measure group cohesiveness by 
observing changes in response on a pre-post sociometric-type test. The 
resulting changes were analyzed to study the relationship between cohesive- 
ness and the social-emotional climate the group was required to work 
within while solving a problem. Pepitone and Kleiner (4) formed groups 
of male campers into teams on the basis of sociometric choices and ob- 
served the changes in cohesiveness as the result of varying the probability 
of status gain or loss. 


Role Behavior 


Many of the studies of role behavior utilized a combination of direct 
observational technics and subject self-ratings or group-ranking proce- 
dures. Talland (54) used the Bales Interaction categories and a thera- 
peutic interaction analysis scale to study the structuring over a period of 
time of 15 initially informal psychotherapy groups. The subjects also 
ranked their fellow group members on a scale designed to measure status 
perception. Slater (50) likewise used the Bales categories to study role 
differentiation as affected by the manipulation into high-status groups 
(members perceive themselves as similar in problem-solving effectiveness ) 
and low-status groups (members perceive themselves as dissimilar). 
Mitchell (39) observed six sessions of a citizens committee seeking to 
evaluate itself. The roles played by the various members were categorized 
as problem-solving behavior, maintenance of group atmosphere, and 
“group blocker” behavior. 

Another method for obtaining information relevant to the role an indi- 
vidual and/or his group perceive him playing is to request responses from 
the group and the individual on such devices as sociometric tests and peer- 
self ratings. Crowell, Katcher, and Miyamoto (11) designed two question- 
naires dealing with a person’s feelings about his skills as a communicator 
and a communicant. Subsequently, self- and group-member-ratings of per- 
formance in three discussion groups were related to the self-perceptions of 
the individuals. By manipulating the instructions on the same sociometric- 
type instrument, Hollander and Webb (29) were able to study the relation- 
ship between leadership, followership, and friendship among a group of 
naval air cadets. Carter (9), as the result of a careful factor analysis of a 
great many previous studies, found three factors which might be useful in 
the study of leadership behavior. The three which emerged from the study 
were (a) achieving various personal goals, (b) aiding attainments by the 
group, and (c) sociability. 
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Assessment and Selection 


Ever since the OSS assessment program demonstrated the feasibility of 
using group technics in selecting men for critical jobs, other organizations 
have attempted to use these technics in the selecting and training of candi- 
dates for vital positions. Perhaps the major methodological problem for- 
the assessment team is the development of situational tests which yield 
data relevant to prediction of individual performance on a job. Weislogel 
and Schwarz (60) suggested three criteria for designing an effective prob- 
lem. Crutchfield (12), in dealing with the problem of measuring conform- 
ity to group judgment, suggested criteria which should be accounted for 
in the construction of any type of situational test. Bringing their experience 
to bear, Bales and Flanders (3) discussed the planning necessary for set- 
ting up a functional observation room for studying small groups. They 
treated such topics as comfort and space requirements, needs of research 
staff, visiting spectators, and the design of the experimental room. 

Bass (5) studied the utility of leaderless-group discussions for evaluating 
leadership behavior and concluded that the technic had high validity as 
well as predictive value. Wilson and Robbins (61) reported the use of 
leaderless-group discussions as one aspect of a selection program for poten- 
tial guidance workers. This article dealt particularly with such problems 
as the size of the group being observed, the number of judges, the effect 
of judges’ presence, and the length of the discussion period. In consider- 
ing the usefulness of these technics, one must also be concerned with the 
handling of the accumulated information. Morris (40) studied some of the 
factors which affect the validity of the judgment of assessors and concluded 
that a consensus of judges’ ratings may result in more valid judgments. He 
further asserted that one must analyze the grouping factors which may 
influence the judges’ consensus. Altho this technic takes more time, both 
in terms of testing hours and rater time, one can only hope that other 
organizations will begin to find this technic promising enough to experi- 
ment with it in the selection and assessment of candidates for various 
positions. 


Conclusion 


The prognosis for future improvements and refinements in the meth- 
odology of observing and recording behavior in groups seems good. The 
period of this review is highlighted by the replications of previous studies, 
the development of mathematical models, an increase in interdisciplinary 
research as exemplified by the Ohio State studies and the December 1954 
issue of the American Sociological Review (1), and the expenditure of 
large amounts of money for long-range studies in natural settings, for 
example, military and industry. 

What must follow in the coming years if group theory and meth- 
odology are to advance is (a) the continuous development of more rigor- 


483 





Review OF EpuCATIONAL RESEARCH Vol. XXVIII, No. 5 





ous technics of locating and measuring variables connected with group 
characteristics and group structure, (b) a closer relationship between 
theory and data gathering (too many studies appear to gather data and fit 
a theory to them rather than vice versa), and (c) a greater uniformity in 
semantics. Hemphill (25) called for the development of a taxonomy of 
group characteristics which would operationally define its terms as well as 
permit accurate measurement. Altho these objectives may seem idealistic, 
it is unlikely that advancements in group theory or methodology can con- 
tinue without some attempt to approach them. 
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CHAPTER VII 


Research Tools: Sealing and Measurement Theory 


SAMUEL MESSICK and ROBERT P. ABELSON* 


ry\ 

Duis cuarrer covers scaling theory from June 1954 to June 1957. Lim- 
ited space precluded several pertinent references, and those included reflect 
an attempt to give adequate coverage to developments deemed important 
by the reviewers. Elementary texts on scaling methods by Guilford (35) 
and Edwards (25) appeared during this period. 


Theories of Measurement 


Old controversies about fundamental measurement in psychology were 
recently revived. Siegel (70) believed that arithmetic operations and statis- 
tical technics based upon them were limited to scales with interval prop- 
erties, since “the operations allowable on a given set of scores are depend- 
ent on the level of measurement achieved.” Stevens (72) concurred, but 
Savage (66) disagreed. Mount (61) denied the traditional necessity of 
demonstrating physical manipulations paralleling various scale operations 
and believed numerical measurement to be justified merely by the specifica- 
tion of instructions for assigning objects or attributes to a numerical 
reference system. 

The recent appearance of a strong trend toward axiomatization in scal- 
ing theory will undoubtedly contribute to a resolution of such contro- 
versies. Suppes (75), starting from a set-theoretical definition of a theory 
of measurement, formalized the logical foundations of scaling and pro- 
vided an axiomatic treatment of models underlying ordinal, interval, and 
ratio scales. Scott and Suppes (67) also discussed logical aspects of 
measurement, particularly the conservation of axioms. Coombs (17) mean- 
while clarified some properties of underlying models by classifying inter- 
relationships between the scaling of individuals and the scaling of stimuli. 
In another classificatory scheme, Coombs (16) located known scaling 
methods in a 2 x 2 x 2 table. 


Axiomatic Models 


Several axiomatic models for different types of measurement are con- 
sidered together in this section solely for convenience in referring to a 
unified approach to model construction. Davidson and Suppes (20) axio- 
matized utility and subjective probability in terms of an indifference re- 
lation for alternatives equally spaced in utility. Adams and Fagot (3) 
intensively analyzed a two-dimensional measurement model assuming 
"©The seviewers are indebted to Frederic M. Lord for his helpful comments on the manuscript. A 


slightly expanded version of this chapter with a 205-item bibliography will be distributed as an Educa- 
tional Testing Service Research Bulletin available from the authors. 
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additive utilities. Luce (49) formulated a probabilistic theory of utility, 
and a theory of utility discrimination in terms of “semi-orders” (50). 
Semi-orders do not imply that the relation of transitivity holds; that is, 
a=b and b=c do not necessarily imply a=c. A simplified axiomatization 
of semi-orders was given by Scott and Suppes (67). Galanter’s attempt 
(29) to apply experimentally an intransitive matching relation serves to 
illustrate the difficulties currently encountered by many axiomatic models 
in handling the concept of error. 


Luce (51), in perhaps the most important paper reviewed in this chap- 
ter, formulated a probabilistic theory of individual choice behavior based 
upon an intuitively reasonable axiom with extremely general and powerful 
consequences. The basic axiom may be stated as follows: If T is a finite 
set of elements and R is a subset of S, which is a subset of 7, the probabil- 
ity P(R;T) that an individual will select an element contained in the sub- 
set R when the choice is restricted to T is equal to the probability that the 
chosen element is in R when selected from S times the probability that it is 
in S when chosen from 7; that is, if 7 is finite and RC SCT, then 
P(R;T) =P(R;S)P(S;T). One important consequence of this axiom is 
the existence of a ratio scale v, which may be determined in several ways 
from different probability combinations. The relation between paired- 
comparison probabilities, P(x,y), and the underlying scale values v(x) is as 


follows: P(x) = een 


Decision Process and Utility Measurement 


Davidson, Suppes, and Siegel (21) tested their axiomatic model of 
utility (20) in a betting situation. Subjects were confronted with choices 
between two options: one offering a 50-50 chance of winning a or losing 
b and another offering a 50-50 chance of winning c or losing d. Preferences 
between options were used to order utility differences among elements. 
Considering the fact that these models make little provision for handling 
error, the goodness of fit obtained is very encouraging. Siegel (69) also 
used preference between proBability combinations of stimuli to order scale 
intervals. Such options as a 50-50 chance of winning a or c versus certainly 
receiving b were used to generate a “higher-ordered metric scale.” Hurst 
and Siegel (42) applied this scaling technic by betting with prison inmates 
for cigarettes. They reported that utility was not linear with number of 
cigarettes. Their treatment of scaling errors was vague. Davidson and 
Marschak (19) presented experimental evidence that choice behavior may 
be considered “stochastically transitive.” Edwards (26) employed experi- 
mentally an additive hypothesis (3) that the utility of n identical bets is n 
times the utility of one such bet. 

The use of bets in decision studies represents an interesting gimmick for 
obtaining metric information about utility intervals while avoiding the 
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interaction effects that may plague methods using composite stimuli. At 
first glance, these betting procedures seem of dubious value in psychologi- 
cal contexts. On the other hand, traditional scaling methods may have 
overlooked the role of subjective probability in individual choice behavior. 
Responses to questionnaire items may perhaps be fruitfully regarded as 
decisions under uncertainty—uncertainty about circumstances in which an 


item is meant to apply, about the use to which the responses will be put, 
and the like. 


Coombs’s Models and Methods 


Coombs’s scaling models have been criticized for their inability to handle 
error. Tho Coombs did not come forth with a full-dress theory of error, 
a tendency to meet this problem can be observed. Coombs (15) presented 
an assortment of judgmental methods for scaling stimulus similarity, and 
incidentally recommended replication of judgments as a means of catch- 
ing occasional errors. Dember (22) applied one of these technics to the 
scaling of gray patches differing in brightness. He suggested the use of 
differential reaction times to resolve inconsistent judgments. 

Runkel (65) applied unfolding technic as an indicator of interpersonal 
“cognitive similarity.” Coombs (18), in discussing the construction of a 
“social utility scale,” was able to apply the unidimensional unfolding 
technic. In this treatment he often wrote as tho an interval scale allowing 
continuous distributions was uppermost in his mind. Thus it would appear 
that Coombs has budged slightly from the extreme position sometimes 
imputed to him. 


Paired Comparison Models 


Morrissey (60) and Gulliksen (37) independently presented equivalent 
least squares solutions for incomplete paired comparisons. The latter 
suggested an iterative procedure which markedly reduces computation 
time. Gulliksen and Tukey (39) proposed a variance-components analysis 
for the reliability of paired comparisons. Gulliksen (36) utilized a quantity 
called the “comparatal dispersion” (Yo; +0;-2r,,7,7;) to measure accu- 
racy of paired comparison judgments. Harris (40) revised Thurstone’s 
Law of Comparative Judgment to include asymmetries between a com- 
parison pair AB and its experimentally independent complement BA. The 
revision permits the extraction of time error and order effects. Gulliksen 
(38) tested four laws for predicting the scale value of a composite stimulus 
from the scale values of its components. Linear and negative exponential 
curves both gave a good fit for food preferences. Rimoldi (62) also ob- 
tained a linear relationship for predicting scale values of combined stimuli 
from components in a study of “famous men you'd like to know.” In this 
way the Thurstone methods, thru the scaling of composite as well as single 
stimuli, offer an alternative to bets in the measurement of utility (21). 
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Kendall (45) derived scale scores by finding the principal characteristic 
vector of a paired-comparisons matrix. The technic is applicable to incom- 
plete data and may be used on a single set of judgments from one indi- 
vidual. Suppes (75) gave an axiomatic formulation of a stochastic model 
for paired comparisons which assumes that P(a,b), the proportion of times 
a is preferred to b, is a monotone increasing function of the difference in 
scale values; that is, P(a,b) y P(c,d) implies v(a)—v(b) » v(c)—v(d). 
Suppes described a linear programing solution to these inequalities. 

In the Bradley-Terry model (11, 34) for paired comparisons, P(a,b) is 
ma 


ma + 1b 

appeared in the Luce model (51) discussed above. Thus the “seemingly 
arbitrary” (47) Bradley-Terry formulation is by hindsight justified thru 
Luce’s axiom. Furthermore, a powerful means for applying the Luce model 
experimentally is thru the extensive Bradley-Terry machinery (12). Abel- 
son and Bradley (2) applied the method to stimuli with combined attri- 
butes in a 2 x 2 factorial. Such an application suggests a multidimensional 
extension of the model for known stimulus dimensions. It is of interest that 
if the logistic curve is substituted for the normal ogive in the Thurstone 
model, log xa corresponds to Thurstone’s scale value (cf. 34). Adams and 
Messick (4) proved that the only distribution function allowing the Luce 
model (51) and Suppes’s monotone model (75), of which Thurstone’s 
Case V is a special form, to fit the data simultaneously is the logistic. In 
view of the coordinating power of Luce’s axiom and the negligible differ- 
ence in practice between the logistic curve and the normal ogive, the use 
of the logistic for scaling is compellingly suggested. However, empirical 
comparisons (8, 43) of several different paired comparison models have 
resulted in remarkably similar scale values. 


related to scale parameters x by P(a,b) = . This relationship also 


Categorical Judgment Methods 


Scaling solutions for successive intervals have taken several forms under 
various names, but they reflect essentially the same basic model. Since 
some of these solutions (cf. 64) assumed equal stimulus dispersions, the 
flexibility of unequal dispersions (13) in the general model has been 
largely overlooked (74). The basic Thurstone formulation was axiomatized 
by Adams and Messick (4) and generalized to nonnormal stimulus dis- 
tributions. Diederich, Messick, and Tucker (23) derived a weighted least 
squares solution for successive intervals which is applicable to incomplete 
data; the corresponding punched-card procedures were also outlined (57). 
Rozeboom and Jones (64) analyzed the effects of various errors upon 
successive interval scale values and found the scales to be stable under 
sampling fluctuations and insensitive to small departures from normality 
and equality of dispersions. Guilford (35) suggested a chi-square test of 
goodness of fit for successive intervals which inappropriately assumes 
independence of proportions across categories. Bock (9) proposed assign- 
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ing weights to ordered categories such that n judges classifying m stimuli r 
times each produce a maximum variance ratio for the stimuli. The technic 
permits the statistical rejection of deviant judges. Rosner (63) assayed a 
category judgment model in which scale values are directly calculable 
from the probability densities of stimuli over response categories. The 
derivation implicitly contains the restrictive assumption that all finite 
category widths are equal. 

Morris and Jones (59) applied to preferences among “ways of life” in 
five cultures a factor analysis of successive intervals data. Kelley and 
others (44) noted that extreme discrepancies between equal-appearing 
interval attitude scales for Negro versus white judges were substantially 
decreased by successive intervals technics and almost eliminated by paired 
comparisons. Evidently judgmental distortion effects as a function of atti- 
tude do exist but are extremely subtle. Gardner and Thompson (30) care- 
fully constructed interpersonal rating scales by having subjects select 
anchoring individuals from an outside reference group. Cliff (14), scal- 
ing adjectives and adjective-adverb combinations, obtained exceptional 
support for the hypothesis that the common adverbs of degree serve to 
multiply the intensity of adjectives. 


Psychophysical Scaling 


Stevens and Galanter (74) produced a pivotal monograph which differ- 
entiated psychophysical methods into two classes: those yielding “category 
scales” (equal-appearing intervals, successive intervals, paired compari- 
sons, triads, and equisection) and those yielding “magnitude scales” 
(fractionation, doubling, constant sum, and magnitude estimation). The 
assembled data depicted category and magnitude scales as nonlinearly 
related to each other. From this the implication was drawn that category 
scales are artifactual, while magnitude scales are true scales. To forestall 
the reader from drawing the opposite conclusion, two factors tending to 
distort category scales were enumerated: heterogeneity of subjective cate- 
gory widths and of stimulus dispersions. One would look to the method 
of successive intervals to correct these two sources of distortion in equal- 
appearing intervals. Extraordinarily, the huge Stevens and Galanter com- 
pilation of category scales does not include any successive intervals 
analyses. It would be of utmost interest to apply directly to the assembled 
psychophysical data the method of successive intervals allowing unequal 
stimulus dispersions. 

Controversy over the properties arising from magnitude scale methods 
still rages. Garner (31) pointed out that fractionation technics suffer from 
the unlikely assumption that the verbalized stimulus ratio corresponds to 
the true stimulus ratio and proposed a method using fractionation and 
equisection jointly. Apart from the content of the argument, the present 
reviewers lament the usage: true ratio scale. No single criterion for such 
absolute truth presently exists. Stevens (73) summarized evidence that 
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many psychophysical scales derived by magnitude methods approximate 
a power function law. Thus the magnitude methods are properly justified 
by their excellent ability to coordinate empirical data. Simple faith in 
ratio-scales, however, via the application of magnitude methods to expres- 
sive faces (28), neckties (24), and the like is not likely to prove especially 


rewarding. 


Multidimensional Scaling 


Many contributions to multidimensional scaling theory and method 
have followed the impetus given the field by Torgerson. Developments up 
to 1955 were reviewed by Messick (56). The multidimensional method of 
successive intervals (MDSI) was empirically evaluated (54) in the area 
of color perception by comparisons with scale values obtained from the 
Munsell color system and from the complete method of triads. The cor- 
respondence between these sets of scale values was exceptionally good, the 
proportions of common variance exceeding .95 (68). Mellinger (53) also 
applied MDSI to a variety of colors and obtained a separate dimension 
for each hue. However, the colors were extremely disparate, and the result 
would probably not have occurred with a more circumscribed set of 
stimuli. Agreement between scaling in the large with supraliminal differ- 
ences and scaling in the small with confusions is not necessarily to be ex- 
pected. Abelson (1) and Messick (55) used MDSI to compare attitude 
perceptions for two diverse groups. They both found merely minor varia- 
tions between groups, suggesting that judges’ attitudes distort only slightly 
their perceptions of attitude relationships. 

Shepard (68) proposed a model for using confusion errors in paired 
associate learning as a basis for multidimensional scaling. He applied this 
important new technic to confusions among color stimuli, and the propor- 
tions of varia:ce common with earlier analyses (56) exceeded .97. Ekman 
(27) scaled pairs of color stimuli with respect to similarity by the method 
of equal-appearing intervals and factored the obtained matrix of similarity 
scores directly! Such an analysis probably routinely results in spurious 
added dimensions. This criticism also applies to Andrews and Ray (6), 
who interpreted a simple square root function of P(a,b) as a correlation 
coefficient to be factored directly. Wilson (78) defined a simple but some- 
what arbitrary distance function relating squared distance to the propor- 
tion of times two stimuli are judged similar in the method of triads. 
Tucker (76) discussed a vector model for paired comparisons which yields 
a dimensional representation of the stimulus space as well as a group pref- 
erence scale. 


Guttman Scale Analysis 


Guttman’s coefficient of reproducibility (Rep) typically has a variously 
high expected value under the null hypothesis of item independence. Ob- 
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tained Reps are therefore apt to be misleading, and a number of alternative 
coefficients have been suggested. Green (33) proposed an “Index of Con- 


Obtained Rep—Chance Rep Borgatta (10) 
Chance Rep 
recommended a different index based on much the same grounds. White 
and Saltz (77) reviewed many but not all the alternative coefficients and 
favored Green’s because of an accompanying approximate significance test. 
Milholland (58) considered nonstandard assessments of reproducibility. 
His most intriguing suggestion was a coefficient giving the percentage of 
individuals whose response patterns fit perfect scale types, with multi- 
dimensional Coombsian scale types included. Slater (71) revived a technic, 
originally suggested by Guttman in 1941, that simultaneously assigns 
scores to people and to item responses according to a maximum variance- 
ratio principle. The technic applies to items with any number of alterna- 
tives. A significance test for scalability was given. Lord (48) showed that 
Guttman’s principal components are the scoring weights that maximize 
the generalized Kuder-Richardson reliability coefficient and that the prin- 
cipal component for any dichotomous item is the same as the ordinary 
factor loading of the item divided by the item standard deviation. Applica- 
tions of Guttman scaling were legion; space precludes their inclusion here. 





sistency” with the formula, I = 


Latent Structure Analysis 


Anderson (5), Gibson (32), and McHugh (52) offered improved 
matrix solutions for latent class parameters. McHugh gave a significance 
test for the number of latent classes. Birnbaum (7) showed, for factorially 
homogeneous items whose trace lines are logistic curves, that a certain 
weighted average of a respondent’s item scores is a sufficient statistic for 
estimating his score on the underlying common factor. Lazarsfeld (46) 
discussed various aspects of latent structure models. Hays and Borgatta 
(41) found that the general three-parameter solution in the latent distance 
model is to be preferred in practice to the restricted two-parameter 
solution. 

The present reviewers were disappointed by the absence of substantive 
applications of the Lazarsfeld models in the available literature. 


Criteria for Scale Evaluation 


Traditional psychometric theory recognizes multiple criteria for evaluat- 
ing a measurement procedure. Scaling theory should do likewise to avoid 
futile bickering as to whose numbers are on the side of the angels. Replica- 
bility of scales in repeated experiments is a universally recognized criterion 
of scale goodness. Beyond this, three other criteria by which to evaluate 
a psychological scaling technic should be mentioned: (a) Internal consist- 
ency. Does part of the scaling data appropriately predict remaining scale 
relations? Indeed, does the model provide any means for making such 
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predictions? (b) Coordinating power. Does the scaling technic yield a 
body of empirical scales which can be organized coherently? (c) Ap- 
propriateness as a psychological model. Is the subject’s behavior treated 
consistently with what is known about similar human behavior in other 
psychological contexts? 

In the opinion of the reviewers, the most crucial differential criterion of 
the three is the last, for it is on the basis of their implications for general 
psychological theory that different scaling models will ultimately prove 
their relative merits. We propose a tentative classification of existing scal- 
ing methods according to their psychological model of what the individual 
judge or respondent is “really” doing. 

1. Deterministic models: Guttman, Coombs, Stevens, Siegel (69), David- 
son-Suppes (20). The individual has a fixed “real” scale of judgment or 
preference, and his verbal reports reflect “real” metric properties. 

2. Stochastic models 

a. Roulette wheel or urn models: Lazarsfeld, Bradley-Terry (11), and 
possibly Luce (51). The individual acts as tho he consults a table of 
random numbers before making certain choices. The scale unit is a proba- 
bility measure. 

b. Confusion models: Thurstone and developments in his tradition. 
The individual acts as tho his momentary impression of the stimulus is 
subject to random fluctuations. The scale unit is a discriminal dispersion. 

c. Dynamic vacillation models: No existing models altho Coombs 
(18) implied such a formulation. The individual acts as tho he himself 
shifts his standards of judgment or preference from moment to moment. 
The scale unit would presumably be a dispersion unit. 

d. Noncommittal models: Suppes (75), Davidson-Marschak (19), 
and possibly Luce (51). The axiomatic systems postulate stochastic be- 
havior without seeking an underlying psychological “explanation.” 

Deterministic and stochastic models each have their difficulties. The 
deterministic models are forced to make arbitrary assumptions about 
error, while the stochastic models often pool data from different individuals 
even tho for such areas as attitudes, personality, and preferences, the as- 
sumption of subject homogeneity is dubious. 
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CHAPTER VIII 
Research Tools: Statistical Methods 


WILLIAM B. MICHAEL, HENRY F. KAISER, and CHERRY ANN CLARK 


= 


Taar the rate of growth of statistical methodology has been nothing 
short of amazing during the past three years is immediately apparent to 
anyone who examines the amount of research published in the journals. 
Even within the somewhat restricted area of coverage represented by this 
chapter more than 750 references were located of which slightly more than 
200 were included. In view of space limitations in the Review it appears 
that chapters on research and statistical methodology must be increasingly 
selective. Altho rigid criteria were not set in the choice of books and 
articles to be reviewed, an attempt was made to cite those publications, 
relative to the scope of this chapter, that represent (a) a significant 
modification of aspects of statistical theory and/or (b) an important 
contribution of particular relevance to the analysis of data associated 
with research problems in education. 

The last comprehensive single chapter on developments in statistical 
theory was the excellent one by Johnson and Moonan (105) in the 
December 1951 Review. The December 1954 issue covered most of the 
- publications in statistical methodology from 1951 to 1954. The portions 
of this chapter dealing with nonparametric methods, regression and cor- 
relation technics, and factor analysis represent extensions of the cor- 
responding chapters by Blum and Fattu (22), Hoyt and Johnson (102), 
and Solomon and Rosner (179) in the December 1954 issue, and cover 
the three-year period following July 1, 1954. In addition, attention is 
given to a few earlier noteworthy contributions not previously noted in 
the REvIEw. 

It should also be mentioned that a number of contributions to statistical 
methodology have been omitted in addition to those represented by the 
content of other chapters in this issue. In particular those statistical 
methods that seemed to be especially applicable to problems of test con- 
struction, analysis, and evaluation were deferred for a chapter in a sub- 
sequent issue on educational and psychological testing. Thus, many con- 
tributions in the area of regression and correlation as well as recent writ- 
ings on pattern and profile analysis have not been included. Largely 
because of the somewhat limited amount of material available during the 
past three years, papers concerned with decision theory, the discriminant 
functions, sequential analysis, classification, and sociometric problems 
were not considered except incidentally. It would seem probable that 
within the next three years a sufficient accumulation of material on 
decision theory and discriminant analysis will be forthcoming to allow 
inclusion of a chapter on these topics for the 1960 issue of the Review. 

This chapter is organized as follows: After a review of recent books in 
statistics that are of interest and help to research workers in education, 
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major consideration will be given to (a) general developments in statistical 
theory with particular emphasis upon contributions to statistical inference 
involving parametric procedures; (b) recent contributions concerning 
chi-square and contingency tables as well as related topics; (c) nonpara- 
metric methods including some material on measures of correlation and 
association; (d) regression and correlation technics primarily viewed in 
a parametric setting; and (e) factor analysis in which, in view of space 
limitations, only methodological advances are treated. 


Books 


Altho during the period reviewed, many of the significant books per- 
taining to statistical methodology were devoted to experimental design, 
several other books in statistics were published that are of interest to 
research workers in the behavioral sciences. In statistical inference a 
potential classic is the volume by Fisher (62). Among introductory books 
of general scope are those by Adams (1), Dixon and Massey (52), Hoel 
(98), Li (122), Snedecor (178), and Wallis and Roberts (204). Of these 
texts the two by Dixon and Massey and Wallis and Roberts probably 
placed the least premium on background in mathematics altho two or 
possibly three years of college mathematics should suffice for compre- 
hension of the material in any of the other volumes cited. Of all the 
books mentioned, perhaps the most useful one to the individual doing 
applied research is the revised edition by Dixon and Massey which, to 
say nothing of the unusual clarity of exposition of a variety of standard 
topics, contains 33 different tables. Other useful tables were compiled by 
Pearson and Hartley (148) from those appearing in various issues of 
Biometrika. 

Probably the most elementary and perhaps one of the most readable 
statistics texts is that by Underwood and others (203). In addition, new 
volumes appeared by Cornell (43), Edwards (59), Tate (187), and Wert, 
Neidt, and Ahmann (209); they should be useful in two- or three- 
semester courses for upper division or graduate students in the behavioral 
sciences. Guilford (73) and McNemar (128) revised their well-known 
texts. As teaching aids to accompany the Guilford volume, Guilford and 
Michael (75, 76) prepared two workbooks that contain problems requiring 
step-by-step solutions as well as answers. 


Important contributions of a more specialized nature were to be found 
in the areas of nonparametric statistics and factor analysis. In the former 
field, Fraser (65) wrote a mathematically rigorous text in nonparametric 
statistics which only a handful of specialists in the behavioral sciences will 
be able to comprehend. On the other hand, Siegel (177) prepared a com- 
prehensive, lucid text; especially oriented to the needs of research workers 
in the behavioral sciences, it is probably the best current source of material 
on nonparametric methods for readers of the Review. 
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In factor analysis Adcock (2) provided a brief and elementary book 
for the student with minimal training in mathematics. Relatively elemen- 
tary and well suited to a three- or four-semester-hour course in factor 
analysis is Fruchter’s text (66) that is essentially Thurstonian in its 
emphasis. An excellent chapter on factor analysis is to be found in Guil- 
ford’s revised edition of Psychometric Methods (74), a volume which also 
treats several other quantitative methods from a somewhat applied point 
of view. 

Not to be overlooked is the first volume of the Handbook of Social 
Psychology, edited by Lindzey (123), which contains several chapters on 
different aspects of statistical methods. Extremely useful sources of mate- 
rial on statistical theory and research design are the chapters that were 
prepared by Jones (108), Moses (138), and Gardner (67), respectively, 
in the 1955, 1956, and 1957 volumes of the Annual Review of Psychology. 

Perhaps the one most intriguing book in statistical methodology during 
the past few years is Meehl’s short but important Clinical vs. Statistical 
Prediction (131). In pitting actuarial and clinical methods against each 
other, the author took 20 studies in which both approaches were used to 
predict future behavior and compared the relative merits of each approach. 


General Developments Primarily in 
Parametric Statistics 


Outside the areas of regression and correlation and analysis of variance 
and covariance, published research tended to be nonparametric rather 
than parametric in its emphasis. Nevertheless a few noteworthy contribu- 
tions to parametric theory appeared. Two highly readable papers of 
general interest were those by Chernoff (32) and by Tukey (202). 

Chernoff presented a selective review of large-sample parametric theory. 
He also considered aspects of inference relative to the emphasis of the 
maximum likelihood principle in estimation and the use of optimal designs 
for estimating parameters and for testing both simple and composite 
hypotheses against appropriate alternatives. In his stimulating paper 
Tukey pointed to a number of unsolved problems of experimental statistics. 

Contributions to statistical inference were made by Halperin (88) who 
presented charts for estimating parameters when sampling has occurred 
for a singly truncated normal distribution, and by Raj (153) who treated 
the estimation of parameters of a Type III population from singly and 
doubly truncated samples. Thru the use of a double sampling procedure 
for the estimation of population means relative to a given confidence level, 
Seelbinder (174) described a method for ascertaining the size of the first 
sample stage upon which the second sample stage is dependent; he fur- 
nished tables for determination of the optimal size for first-stage samples 
in terms of four different confidence coefficients. Owen (144) presented 
a double sampling procedure for testing the mean of a normal distribution; 
his method does not require so many observations as conventional single 
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sample tests for maintenance of the same degree of power. Readily 
extended to the case of the difference between two means, the test was 
treated from the standpoint of both known and unknown standard devia- 
tions, and its use was facilitated thru accompanying tables. In certain 
respects these double-sampling procedures constitute a simple decision- 
making process not unlike a rudimentary sequential analysis. Altho the 
topic of sequential analysis is outside the scope of this chapter, the 
interested reader is referred to a comprehensive article written by Fiske 
and Jones (63) especially for an audience of psychological research 
workers. 

In a comprehensive paper Proschan (151) discussed at length the 
similarities and differences between confidence and tolerance intervals 
for the normal distribution in various cases of known and unknown means 
and standard deviations. Noether (142) treated two confidence intervals 
for two different expressions of corresponding ratios of probability esti- 
mates that may be interpreted as measures of effectiveness. 

In two related articles Chernoff and Lieberman (34, 35) considered the 
use of probability paper. In their first article, the writers presented tables 
for samples up to size 10 as an aid to the selection of ordinates on normal 
probability paper that will permit “optimum” graphical determination of 
the mean and standard deviation of a normal distribution. Extending their 
problem to the case of a general continuous distribution with finite 
variance that is completely specified except for location and scale param- 
eters, the writers in their second paper stated in abstract terms neces- 
sary and sufficient conditions to guarantee optimal estimates not only of 
the scale parameters, but also of each of the percentiles. 

Bridging to some extent the gap between parametric and nonparametric 
approaches are order statistics (e.g., percentiles or linear combinations 
thereof) used to estimate the parameters of populations of specified form. 
The lucid exposition of order statistics and the extensive tables furnished 
by Dixon and Massey (52) are adequate for the purposes of many 
educational research workers; in addition, important theoretical contri- 
butions to the estimation of parameters from combinations (usually 
linear) of order statistics appeared. In a series of closely related papers 
Sarhan (163, 164, 165, 166) employed order statistics in estimating 
parameters of various types of distributions, and subsequently Sarhan 
and Greenberg (167) wrote a theoretical paper concerning the use of 
order statistics in the estimation of parameters in singly and doubly 
censored samples. Relative to the problem of missing or censored observa- 
tions, Sarhan and Greenberg (168) published tables for estimation of 
parameters for samples up to 10 in size. For samples of size 20 and less 
from a normal distribution Teichroew (193) reported the expected values 
of order statistics and of the products of order statistics, and furnished 
tabulations. 

Additional miscellaneous contributions to statistical theory were God- 
win’s expository and detailed paper (69) on generalizations of inequali- 
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ties of the type set forth by Tchebychef as applied to a variety of dis- 
tribution functions, Tukey’s suggestions (201) for maintaining simplicity 
in moment-like sampling computations, Sandler’s arithmetical simplifica- 
tion (162) of the t-test of the significance of the difference between cor- 
related means, and Teichroew’s listing (192) of a number of unpublished 
statistical tables prepared by members of the Numerical Analysis Institute 
of the University of California at Los Angeles. Of possible interest to 
people in the mental hygiene field was the development by Marshall and 
Goldhamer (129) of three statistical models based upon application of 
Markov processes that may be used in the study of such variables as the 
age of onset of a psychosis and the age of admission to a mental hospital. 

Finally mention should be made of a running controversy concerning 
the use of either one-tailed or two-tailed significance tests that appeared 
in at least seven different articles in the Psychological Review and Psycho- 
logical Bulletin between 1951 and 1954. In two reviews Jones (108) and 
Moses (138) cited pertinent references and summarized and evaluated 
the arguments presented in. the original sources. 


Chi-Square, Contingency Tables, and Related Topics 


Extending his treatment in an earlier comprehensive article (36) dealing 
with chi-square, Cochran (37) wrote a highly substantive paper con- 
cerning ways in which applications of common chi-square tests may be 
strengthened. He discussed such problems as goodness of fit, subdivision 
of degrees of freedom in the detection of a linear or other type of trend 
in a contingency table, and over-all significance tests for combinations of 
2 x 2 contingency tables. After making the point that chi-square as a test 
of goodness of fit gives us no indication of how a null hypothesis fails 
because it is not directed against any particular pattern of deviations of 
the observed from the expected frequencies, Cochran (38) suggested a new 
procedure called the L-test, which is a linear function of the deviations 
between a set of observed frequencies and another set of corresponding 
frequencies so chosen in advance that L will be sensitive to the alternative 
hypothesis thought most likely to hold. Altho approximate except in the 
asymptotic case, the L-test can be made responsive to any specified pattern 
of deviations with respect to either their signs or magnitudes. 

An important paper also concerned with goodness of fit was that by 
Chernoff and Lehmann (33) who demonstrated that chi-square for grouped 
frequencies follows the calculated chi-square distribution provided the 
estimates of the parameters employed in the calculation of expected fre- 
quencies constitute maximum likelihood estimates. When the sample 
mean and standard deviation are used in the test for normality of distri- 
bution, there is a tendency for the value of chi-square to be overestimated 
especially in the instance of a small number of cells. In a recent short 
article concerning the interpretation of chi-square tests with respect to a 
study of the preference of 31 subjects for two drugs administered on two 
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occasions, Armitage and Healy (8) compared goodness-of-fit tests and 
variance tests with exact tests and showed in their tabulation that the 
probability values closest to the .05 point turned out to be .018, .064, and 
.0415, respectively, for the three approaches. 

Berkson (16), dealing with a rather specialized problem in bio-assay 
work involving the logistic function, reported the results of a series of 
experiments that cast doubt upon the widely held view of the efficiency of 
maximum likelihood estimates when compared with the minimum Pearson 
chi-square estimates. In particular his findings conflicted with the more or 
less commonly accepted principles that (a) a sufficient estimator is either 
unique or functionally related to a maximum likelihood estimator and 
(b) the maximum likelihood estimator in the instance of asymptotically 
efficient estimators will extract the greatest amount of information from 
the data. 


Significance Tests 


Relative to the performance of significance tests of individual 2 x 2 
contingency tables, several helpful papers were published. Applicable not 
only to fourfold tables, but also to those problems the data for which are 
compared with the chi-square model, are the important tables prepared by 
Lewis (121) that furnish the 0.1 and 99.9 percent points of chi-square 
over an extensive range of degree of freedom. Tables prepared by 
Finney (61) are useful in yielding an exact test of association for 2 x 2 
tables at .05, .025, .01, and .005 levels of significance in the instance in 
which both frequencies present in one of the margins are less than or equal 
to 15; Latscha (120) extended these tables so that both marginal fre- 
quencies go up to 20. In the instance of a 2 x 2 contingency table that 
includes up to a total of 50 observations Armsen (9) provided, for both 
one-tailed and two-tailed tests at the .05 and .01 levels, a set of exact 
probability tables derived from use of a hypergeometric formula. 

Making use of binomial coefficients instead of factorials in order to 
shorten the calculation of exact probabilities for either 2 x 2 or 2x r 
contingency tables, Sakoda and Cohen (160) furnished a table of bi- 
nomial coefficients accurate to four significant figures for n between unity 
and 60 as well as a set of inequalities for estimation of cumulative proba- 
bilities in the tail of the binomial distribution relative to a given set of 
entries in a contingency table. Another group of tables very useful when 
the proportions p, and p, of individuals in each of two samples belonging 
to a given category are very small (altho the numbers of individuals 
N, and N, in the two samples are relatively large) was prepared by 
Patnaik (146) on the hypothesis that the two samples are drawn from a 
common Poisson population. 

Quick graphical methods for the evaluation of the significance of 
entries in 2 x 2 contingency tables were described in two papers by Bross 
and Kasten (26) and by Trites (197). In the first mentioned paper, charts 
are furnished that permit both one-tailed and two-tailed significance tests 
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without the need for calculation of chi-square provided the proportion of 
cases in either of two samples being compared is between .10 and .90 of 
the combined number of cases in the two samples. Altho minimizing com- 
putational effort, the sets of curves in the second paper are strictly 
applicable only when the numbers in each of the two samples are the 
same. ; 

As to significance tests involving proportions and percentages Gengerelli 
and Michael (68) proposed a procedure for evaluating the reliability of 
the difference between proportions and for setting up confidence intervals. 
In a paper also concerned with the estimation of binomial parameters 
Bross (24) devised a method based on the construction of a confidence 
interval for ascertaining whether a sample proportion p, is significantly 
larger by a certain percentage than another sample proportion p, as 
given by 8 = 100 (pz. — p,)/p, upon the assumption that both p, and 
P2 are sufficiently small and that the samples are sufficiently large to 
permit use of a Poisson approximation. 

Crow (46) described how confidence intervals could be established for 
a proportion. Rao and Chakravarti (155) developed significance tests for 
the Poisson distribution in the case of small samples. 


Combining Independent Tests of Significance 


For the important problem of evaluating a series or combination of 
independent significance tests, Jones and Fiske (109) proposed and de- 
scribed in detail a binomial model and a chi-square model involving a 
logarithmic transformation of the product of the independent probabilities 
p associated with each of the k number of independent significance tests. 
That each of the sets of measures must be independent was emphasized 
in application of the models altho substitute procedures were suggested 
when statistical independence could not be assumed. Well adapted to the 
calculation of numerical values of chi-square arising from use of the 
second model was a set of 2 x k tables prepared by Gordon, Loveland, 
and Cureton (71). Perhaps of even more help was the appearance of 
two nomographs prepared by Sakoda, Cohen, and Beall (161) that furnish 
chance probabilities of obtaining n or more statistics significant at the 
.05 or .01 levels, respectively, from N calculated statistics. 

For the situation in which an investigator wishes to test the over-all 
significance of a set of 2 x 2 contingency tables, Yates (215, 216) cited 
reasons why he believed the previously mentioned combination of proba- 
bilities test (which is compared with the chi-square distribution consisting 
of 2 k degrees of freedom) is not too efficient relative to a maximum 
likelihood solution involving the use of some appropriate transformation, 
‘ but nevertheless concluded in his second paper that “chi-square without 
- correction for continuity will give one-tail probabilities for 2 x 2 tables 
which may be safely combined in most cases” encountered in practice. 
In his previously cited paper Cochran (37) considered the circumstance 
in which the investigator is interested in ascertaining an over-all trend of 
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differences in a series of independent pairs of proportions arising from 
some natural ordering of events and proposed a significance test for the 
corresponding combination of 2 x 2 contingency tables, a test that antici- 
pated an equivalent procedure suggested by Armitage (7). 

Writing a comprehensive theoretical paper concerning the combination 
of independent tests of significance, Birnbaum (17) evaluated several 
different approaches that had been proposed and concluded that since 
no single method of combination is in general optimal, attention should 
be given to the kinds of tests to be combined in the selection of a method. 
However, for one kind of common problem it was shown that methods by 
Fisher and by Tippett possessed an optimal property. Finally for the 
reader desirous of an overview of the problem of combining independent 
tests of significance, the detailed three-page discussion by Mosteller and 
Bush (140) may well be the most helpful single source. 


Miscellaneous Developments 


Two noteworthy contributions regarding the separation of the total 
chi-square in contingency tables into components may be briefly cited. 
Kimball (118) proposed short-cut computational formulas that allow 
the partition of the chi-square value in a multifold contingency table of 
r rows and s columns into the chi-squares associated with various (r — 1) 
(s — 1) individual 2 x 2 tables. After drawing an analogy between fac- 
torial experiments involving use of analysis of variance and multiple 


contigency tables in which the nature of measurements prevents their 
being scaled, Sutcliffe (185) furnished a model for the partition of chi- 
square in multiple contingency analysis into components associated with 
each of the main effects and with various orders of interaction, a model 
which is detailed relative to different types of restriction dependent upon 
whether parameters are known or estimated. 

Bross (25) indicated that when errors of classification arise in 2 x 2 
tables as in clinical diagnosis, such errors may be relatively more serious 
in their influence on the estimation of parameters than is their attenuation 
of the accuracy of significance tests (as in the instance of the difference 
between two sample proportions) altho the power of the significance test 
will suffer. For the situation in which frequencies are missing in con- 
tingency tables, Watson (206) proposed a “missing value” formula based 
on the principle of maximum likelihood to compute estimates for all 
unavailable frequencies and also considered the problem of “mixed-up” 
frequencies. 

What may be considered a critical overview of the use of contingency 
tables will be found in a paper by Goodman and Kruskal (70) and in one 
by Mayo (130). The first two writers in a lengthy discussion proposed 
several new indexes for contingency tables (i.e., those of multiple and 
partial association in more complex arrangements) and distinguished 
between ordered and unordered variables and symmetric and unsym- 
metric problems upon the assumption of no underlying continuum being 
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present. Mayo (130) considered some recently developed technics for 
the analysis of association in contingency tables, illustrating his procedures 
with data from a follow-up survey of education graduates. He discussed 
approximate tests of significance for detection of departures of a specified 
type such as the presence of correlation or regression in certain parts of 
a contingency table, tests of higher order interactions, and exact tests of 
significance in the instance of small sample data or in the case of the 
presence of small or even zero theoretical frequencies in contingency 
tables. 

In order to meet the condition in which nonindependence of the 
marginal distribution of a two-way m x m classification may exist, Stuart 
(184) devised a large sample test for the homogeneity of two marginal 
distributions. In still another paper in which the presence of correlation is 
an important factor Stuart (181) furnished a means for comparing the 
frequencies (with respect to possession of a given attribute) in matched 
samples. 

Several papers concerning Poisson processes and the binomial dis- 
tribution appeared in addition to those previously cited. In a compre- 
hensive article Birnbaum (18) described and illustrated statistical methods 
that can be applied to Poisson and experimental processes. For samples 
that are truncated and censored, Cohen (39) proposed maximum likeli- 
hood estimates of Poisson parameters and explained how existing tables 
could be used to apply the formulas to practical problems. As a means of 
stabilizing variances and normalizing distributions Blom (21) compared 
various transformation procedures for the binomial, negative binomial, 
Poisson, and chi-square distributions. To describe the theoretical extent 
of accident proneness in terms of two independent periods of accident 
observation, Webb and Jones (208) proposed what they believed to be 
two operationally comparable models, the Poisson distribution and the 
binomial bivariate (correlation) method, the mathematical equivalence of 
which was proved by Burke (27). In addition, striking relationships to the 
Spearman-Brown reliability estimate were demonstrated. 


Developments in Nonparametric Statistics 


In addition to the previously cited writings on nonparametric statistics 
by Siegel (177) and by Mosteller and Bush (140), mention should be 
made of a very simple, lucidly expressed, nontechnical article by Siegel 
(176). Several other general contributions to the logic and theory of 
nonparametric statistics that either represent important advances from a 
mathematical standpoint or suggest problems in need of solution should 
be cited. 

Using set theory, Bahadur and Savage (11) demonstrated the absence 
of certain statistical procedures in nonparametric problems. It was argued 
that if the distribution of a real random variable in a population is totally 
unknown, little or no information about the tails of the population can 
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be furnished by a sample even if it has been drawn according to a 
sequential method. In their treatment of problems of inference about the 
population mean the writers pointed out that no effective test of the 
hypothesis of the population mean being equal to zero can be made, that 
no confidence interval can be established, and that no point estimate 
is possible since the parameter is sensitive to the tails of the population 
distribution. Concerned with the glibness or apparent lack of interest 
that statisticians show in treating ties in nonparametric procedures, Putter 
(152) showed that when ties are considered as being random variables 
as compared with their being treated as nonrandomized variables, the 
exact power and asymptotic efficiency of the tests are reduced. In essence 
he proposed a nonrandom model for treatment of ties. 

In a valuable theoretical paper on rank order statistics Savage (170) 
pointed out that in nonparametric work it is seldom possible to apply all 
the basic ideas and related concepts pertaining to the testing of alterna- 
tive hypotheses underlying the parametric procedures of Neyman. and 
Pearson since for many of the alternatives considered in nonparametric 
methods neither optimum critical regions nor analytic technics for deter- 
mining power functions may exist. For the two-sample case Savage con- 
sidered alternatives involving monotone likelihood ratios and presented ‘a 
necessary criterion for admissibility. Another theoretical contribution was 
Dwass’s paper (57) concerning the distribution of ranks and of chosen 
rank order statistics. Subsequent to developing the moment generating 
functions associated with two sets of independent sets of ranks from two 
possibly different populations, he showed the Wilcoxon statistic to be a 
special case of the general distribution statistic of rank orders and 
demonstrated that for certain combinations of sample sizes and parent 
populations the limiting distribution is nonnormal. 


Tests of Significance 


As might be expected, several new nonparametric significance tests 
or adaptations of existing tests appeared. In the comparison of two 
samples for significance Moses (139) proposed a test, based on the ranks 
of observed values, that is sensitive to the presence of extreme scores 
(either large or small or both large and small). The test was devised to 
take care of the situation involving experimental and control groups in 
which the effect of the experimental variable may be reflected by an 
increase in the scores of some subjects and by a decrease in the scores 
of other subjects. Employing a criterion based on the rank ranges of two 
samples, Kamat (117) presented a new distribution-free test for samples 
with respect to which it is assumed that no difference exists in the location 
parameters of the two population distributions. 

For the circumstance in which two variables are measured on each 
individual, Hodges (96) developed a bivariate analog of the two-sided 
sign test. In testing the null hypothesis that the bivariate distribution for 
one pair of variables (x;,;) is identical with that of a second pair of 
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variables (x;’,;’) on the assumption that individual vectors are inde- 
pendent, attention is directed toward whether in a second circumstance 
the bivariate distribution has shifted relative to its position in the first 
circumstance in a direction generally the same for all individuals. In 
presenting a new distribution-free test along with appropriate tables of 
significance and power for the hypothesis that given values exist for all 
regression parameters, Daniels (49) assumed that the probability of the 
signs of the independent residuals being positive (or negative) is one- 
half. Altho his test statistic was determined explicitly for two parameters, 
it can be extended in principle to any number of variables. 

Several modifications in, or additions to, existing significance tables 
appeared. To assist the research worker who wishes to make use of the 
G-test (an approximation to the t-test, the denominator of which is 
replaced by the average range of the combined number of samples m), 
Jackson and Ross (103) presented a set of tables that yield 10, 05, and 01 
percentage points in two-tail tests for situations permitting various num- 
bers of m random subgroups each of size n. For a nonparametric test of 
location Rosenbaum (159) prepared an extensive set of tables that merely 
involve a count of the number of points in one sample that fall outside 
an extreme value of the second sample. For determination of significance 
probabilities associated with the Wilcoxon test, Fix and Hodges (64) 
furnished tables that cover a range between 2 and 12 for the smaller 
sample size. Percentage points for the Kolmogorov statistics are available 
in a table prepared by Miller (134). In the instance of a distribution- 
free test of goodness of fit, Anderson and Darling (6) tabulated large 
sample significance points. 


Power and Efficiency 


The problem of the power and efficiency of nonparametric tests was the 
stimulus for several studies. In a comprehensive paper on the asymptotic 
efficiencies of five two-sample nonparametric tests against normal alterna- 
tives to the null hypothesis, Mood (135) determined asymptotic efficiencies 
of 3/x (about 95 percent), 2/x (about 64 percent), and zero, respectively, 
for the Wilcoxon test for location, the median test for location, and two 
run tests. For a square rank test of dispersion an efficiency of 15/2x° 
(about 76 percent) was reported. 

For samples of various sizes Dixon (50) tabulated the power functions 
of the sign test when a is near .05 and .0] and employed a power efficiency 
function, which is the ratio of the size of sample drawn from a normal 
population when the ¢-test is used to the size of sample in the nonpara- 
metric test if equal power is maintained relative to a given alternative. 
The tabulated results demonstrated that the power efficiency decreased 
as the size of sample increased, as the level of significance increased, and 
as the extent of departure 5 of the alternative hypothesis from the null 
hypothesis increased. A second paper by Dixon (51) concerning the 
power efficiency of four nonparametric tests for two samples of size m = 
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n = 3, 4, 5 against normal alternatives revealed, in terms of exact results, 
that the most efficient test is the Wilcoxon (rank sum) test followed by 
the Kolmogorov-Smirnov (maximum absolute deviation) test, and the 
median test. As the value of the alternative hypothesis becomes more 
remote from that of the null hypothesis, the efficiency tends to drop only 
slightly, but relatively more for the first two tests than for the median 
test. 

In studying the efficiency of nonparametric competitors of the t-test, 
Hodges and Lehmann (97) showed that the asymptotic efficiency of the 
two-sample Wilcoxon test relative to the t-test never falls below .864 and 
obtained the limit of a sequence of power efficiency functions for the sign 
test with respect to ¢ in normal populations as the size of samples becomes 
infinite. In other theoretical papers Tsao (199) proposed methods to 
approximate the distribution of ranks that could be used as a basis for 
the evaluation of the power of an arbitrary rank test, and Dwass (56) 
studied the large-sample power of certain rank order tests in two-sample 
problems against alternatives involving one parameter. 

In an empirical study of the power of the rank, median, and run tests 
in which ties are numerous Tate (186) found that the degree of power 
was highest for the rank test, slightly lower for the median test, and 
extremely low for the run test. A related contribution was that of 
Teichroew (191) who furnished empirically derived tables of power func- 
tions of any rank order test of the hypothesis that the two samples come 


from the same population when the sizes of the two samples are 2 and 3, 
3 and 3, 2 and 4, or 3 and 4. 


Tests of Randomness 


Concerning nonparametric tests for the hypothesis of randomness in 
a sequence of values several important contributions were made. Perhaps 
of greatest theoretical interest was Savage’s demonstration (171) of the 
independence between rank order statistics and symmetric statistics such 
as the ¢-test. He concluded that if random variables are independently 
and identically distributed as in random samples, the statistics employed 
to test randomness and parametric features of the null hypothesis con- 
tinue to be independent even tho the parametric parts of the null hypoth- 
esis may be false. For example, one might test a set of observations for 
both randomness and normality. The nonparametric test of randomness 
would probably depend upon an alternative of interest relative to which 
a form of rank order correlation would be appropriate. However, the test 
of normality in which the classical chi-square test of goodness of fit 
should be employed involving the estimation of one parameter would be 
completely independent of the nonparametric test of randomness. 

Among related papers of interest upon randompess was one by Stuart 
(183) who presented tables of the asymptotic relative efficiencies of 
various tests of randomness against normal regression alternatives. Thru 
making use of Kendall’s tau coefficient, Jonckheere (106) not only ob- 
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tained a general statistic S that could be used to test the extent of agree- 
ment between hypothesized rank-order values for n objects or scores and 
an entire set of observed rankings of the same n objects or scores by m 
judges, but also furnished a useful group of tables. As an alternative 
statistic Lubin (127) proposed one he designated as J as a rank order 
test for trend in correlated means. 

To test the hypothesis of randomness of sequence of N observations, 
or equivalently the hypothesis that V independent random variables possess 
the same continuous distribution function, Cox and Stuart (45) pre- 
sented several practical quick sign tests for trend in location and dispersion 
altho they pointed out that even their best tests, which may be preferable 
to other simple tests in the literature, are less efficient than the rank 
correlation tests. In a subsequent note Cox (44) suggested that when an 
inefficient quick test is compared with an efficient test such as t, considera- 
tion should be given to the extent to which agreement is found in the 
application of the two tests to the same set of data. In describing a model 
based on the distortion of a sequence of random values, Barton and David 
(13) suggested its use as an alternative to the hypothesis of randomness, 
applied it to Spearman’s rho coefficient, and stressed its usefulness as an 
alternative to the random sequence associated with any bivariate criterion. 


The Tau Coefficient 


The nonparametric coefficient of association, Kendall’s tau coefficient, 
was the object of extensive study. Haberman (87) described distributions 
for the coefficient that were based on partially ordered systems. Previously 
Stuart (180) succeeded in establishing bounds for the sampling variance 
of the coefficient. With no assumptions about the nature of the scales of 
measurement (other than for the existence of ordinal values), the shapes 
of distributions involved, or the comparative standing or variability of 
different subgroups, Torgerson (196) generalized Kendall’s test of as- 
sociation between two sets of rank orders to the case in which the total 
sample consists of several subgroups of variable sizes and in which data 
on one or both of the variables to be correlated are made up of rank 
orders within each subgroup. Moreover, he demonstrated thru two em- 
pirical examples that a normal approximation to the exact test of signifi- 
cance will suffice in most practical situations. Both Torgerson (196) and 
Adler (3) proposed methods for handling ties. Challenging Schaeffer and 
Levitt (172) who in their comprehensive review of the literature on 
Kendall’s tau coefficient stated that “generally applicable tests of the signifi- 
cance of any partial t are not yet available,” Jones (110) presented a 
significance test in the instance of a partial tau relative to which the 
partialed-out variable is nominal in its scale properties. 


Noteworthy advances were made in the development of improved 
methods of computation of tau such as the simplification proposed by 
Bright (23) and Cartwright (29). Reducing the labor involved in Bright’s 
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approach, Griffin (72) described a graphic method for calculation of the 
coefficient. 

Several miscellaneous developments in nonparametric methods of in- 
terest to research workers in education and psychology were those of 
Jonckheere (107) who described a test of significance of the relationship 
between the predicted grading of n objects into k ordered categories and 
the corresponding rankings of the n objects by m judges, and of Cart- 
wright (30) who developed a quick estimate of multijudge reliability. 
Cureton (48) developed a formula for rank-biserial correlation which was 
shown to be equivalent to both Kendall’s tau and Spearman’s rho co- 
efficients. Stuart (182) demonstrated that for samples drawn from normal 
and from uniform distributions the correlations between variate values 
and ranks are .94 and .96, respectively, when N is equal to 25; the cor- 
relations have .98 and 1.00 as upper bound values as N increases. 


Regression and Correlation 


In correlation and regression theory a variety of problem areas was 
studied. In distribution theory Dunnet and Sobel (53) considered a multi- 
variate generalization of Student’s ¢ distribution, and in treating the bi- 
variate case in detail, obtained exact and asymptotic expressions for the 
probability integral as well as an asymptotic expression for certain per- 
centage points. In a comprehensive paper on multivariate distribution 
theory Olkin and Roy (143) derived the sampling distributions of a broad 
group of statistics directly from the probability law for random samples 
drawn from a multivariate normal population; they showed the application 
of their new derivation to situations concerned with canonical correlations 
and multiple and partial correlations. 

Another important contribution to multivariate analysis was Pillai’s 
proposal (149) of three new test criteria that were based on the character- 
istic roots of matrices derived from the product-moment matrices corres- 
ponding to samples chosen from multivariate normal populations. The 
approximate distributions of the statistics concerned were seen to conform 
to those of Type I or Type II Beta. In line with his statement that the 
distribution laws of random variables are possibly relative to the assump- 
tion of an appropriate kind of stochastic dependence on linear statistics, 
Laha (119) was able to characterize the normal distribution in terms of 
the properties of both linearity of regression and homoscedasticity of 
suitable linear statistics. The distribution of the regression coefficient in 
samples arising from a nonnormal population was the subject of a study 
by Hill (95) who in addition investigated the influence of departures from 
normality upon the significance levels furnished by two t-tests involving the 
regression coefficient. It would appear in the instance of samples of size 11 
or larger that the magnitudes of the discrepancies attributed to non- 
normality are constant and determinable from calculation furnished by 
knowledge of six cumulants. Finally, Harley (92) demonstrated a func- 
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tional relationship between the distribution of noncentral ¢ and an expres- 
sion for a transformed correlation coefficient that arises from a population 
in which the correlation is zero. 


Parameters 


The estimation of parameters in correlation and regression theory was 
a problem central to a number of papers. Using the method of maximum 
likelihood, Cohen (40) estimated the parameters of the bivariate normal 
distribution from restricted samples that arise when certain individuals 
have been eliminated in the selection process, and also illustrated his pro- 
cedures. In the instance of the trivariate normal population both Edgett 
(58) and Lord (124) independently employed maximum likelihood esti- 
mators for determination of the parameters when some of the sample 
observations for one of the variates are missing and found explicit solu- 
tions to the maximum likelihood equations. Another application of the 
principle of maximum likelihood for a multivariate normal distribution 
when some of the observations are absent was subject to abstract discussion 
by Anderson (5) for the simplest case concerning the bivariate normal 
distribution. For the situation in which all individuals in the population 
have fixed, identical values in each of the independent variables X,, X.,—, 
X,, Johnson (104) derived both a best linear estimate of the true mean 
score on the dependent, or predicted, variable, which was in essence an 
average of all predicted values, and a standard error of estimate. 


Confidence Intervals 


Determination of confidence intervals in estimation problems involving 
regression was the subject of papers by Durand (54) and by Crow (47). 
After discussing the nature of two fallacies usually committed in the 
determination of conventional confidence intervals for the partial regres- 
sion coefficients of the multiple regression equation, Durand proposed a 
possible remedy thru the use of joint confidence regions that would seem 
more suitable to regression problems involving the estimation of several 
parameters. For the ordinate of the regression line or surface, Crow for- 
mulated a generalization of the confidence interval that permits the inde- 
pendent variable to assume values at random. 


Serial, Biserial, and Point Biserial Coefficients 


Important theoretical considerations concerning the biserial and the 
point biserial coefficient were treated in three papers by Tate (188, 189, 
190). Altho his findings are too numerous to be cited in detail, a few 
points may be mentioned. In his first paper Tate showed that the point 
biserial coefficient is ¢ distributed when the parameter is equal to zero; in 
his second article he proved that the asymptotic variance was a minimum 
for a fixed value of the parameter in the instance of a 50-50 split in the 
dichotomized variable; and in his third presentation the writer in citing 
applications of correlation models for biserial and point biserial co- 
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efficients compared the circumstances under which one of the coefficients 
would be preferred to the other; he recommended that if any doubt 
regarding the feasibility of a biserial coefficient should exist, one should 
then make use of the point biserial coefficient. 

In a group of closely related papers on serial correlation Hannan (89, 
90) and Watson (207) investigated the correlation of errors in a series of 
ordered observations. In the second cited paper Hannan not only described 
an exact test for the serial correlation present in residuals from the regres- 
sion, but also developed an estimator of the regression coefficient. Watson 
showed that if the errors in the covariance matrix are independently and 
normally distributed in a homogeneous manner, an hypothesis regarding 
regression coefficients could be evaluated thru use of a test statistic with ¢ 
or F distributions. Recently, in still another paper, Hannan (91) described 
how serial correlation may be tested for in least squares regression. 


Regression Analysis 


In the area of regression analysis several contributions that might be 
viewed as somewhat more applied than theoretical were made. Guion (77) 
presented a workable solution to the prediction of a quantitative variable 
from a weighted combination of qualitative or categorized variables and 
illustrated his procedures altho significance tests were not cited. In the 
instance of a two-way classification in the dependent or criterion variable, 
Michael and Perry (133) demonstrated both algebraically and numerically 


the comparability of the simple discriminant function and multiple regres- 
sion technics. Thru a large-scale empirical study involving selection devices 
in education, Sevier (175) tested the assumptions underlying multiple 
regression and concluded that while the existence of both linearity and 
homogeneity could usually be justified, there was considerable doubt as to 
the tenability of the assumption of normality. 


Computational Aids 


Many computational aids in the form of short-cut numerical procedures, 
nomographs, charts, and tables appeared that should be of considerable 
service to individuals in applied statistics. In the area of multiple regression 
analysis Moonan (136) provided an arithmetically simple procedure for 
giving a quick estimate of a multiple correlation coefficient and verified 
empirically the feasibility of its use. It would seem to work best when the 
intercorrelations among the predictor variables are relatively homo- 
geneous. Another contribution to multiple regression analysis was Lord’s 
nomograph (125) for calculating the multiple correlation when there are 
two predictor variables. 

Employing the criterion of least squares, Askovitz (10) developed a 
short-cut graphical procedure for fitting a straight line to a series of points, 
provided they are equally spaced on the axis representing the independent 
variable. To fit an asymptotic regression curve, Patterson (147) proposed 
a simple method. For the computation of correlations based on Q-sorts 
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Cohen (41) derived a useful equation and published a corresponding 
nomograph. 

From four theorems stated by Reiersgl (157) showing the relationship 
of sign properties of zero-order correlation coefficients to those of partial 
correlation coefficients, logical deductions could be made regarding what 
the signs of partial correlation coefficients should be. Michael and Caffrey 
(132) described tables to facilitate calculation of first-order partial cor- 
relation coefficients. Perhaps one of the most useful groups of correlation 
tables is the set that Owen (145) prepared for computation of bivariate 
normal probabilities. 

A special correlation procedure was derived by Winer (210) to indicate 
the amount of interrelationship between overlapping groups of individuals 
several of whom may belong to two or more groups in an organization 
of complex structure. The index proposed, which is equivalent to a 
product-moment coefficient, is not only lacking in certain indeterminancies 
usually inherent in the creation of fourfold tables pertinent to organiza- 
tional analysis problems, but also fairly insensitive to the absolute size of 
totals in the margins of fourfold tables. 


Factor Analysis 


During the period July 1, 1954, to July 1, 1957, there was no lessening 
of interest in factor analysis as a research technic. So overwhelming was 
the number of papers that only a few of the most important methodological 
contributions can be reviewed. 


Communality Estimation 


The communality problem, the resolution of which is essential to the 
proper application of factor analytic methods, has probably been attacked 
with more vigor than during any previous three-year period since Spear- 
man launched factor analysis more than 50 years ago. A truly major 
addition to the not inconsiderable literature concerning communalities was 
that of Rao (154) who answered two statistical questions. First, how can an 
estimate be obtained of the minimum rank of the variance-covariance 
matrix of the common parts of the tests, that is, of the so-called reduced 
correlation matrix? Second, how can the hypothesis that the rank of the 
population matrix is less than or equal to this estimated rank be tested 
against the alternative that it is greater? Thru applying the theory of 
canonical correlation to obtain a vector basis for a common-factor space 
Rao was able to answer the questions posed and to show that his approach 
furnishes one of several possible solutions to Lawley’s well-known likeli- 
hood equations. Altho a firm statistical basis was furnished for determining 
a common-factor space of a testable number of dimensions, the calculations 
involved in Rao’s procedures for a correlation matrix of even moderate 
size appear to be almost beyond the practical capabilities of most available 
electronic computers. 
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Altho many years ago Guttman (83) demonstrated that under rather 
general conditions the communality of a variable is equal to its squared 
multiple correlation on the remaining variables in the battery as the 
number of these variables becomes arbitrarily large, more recent!y he (78) 
reviewed four approaches that have been used to approximate communali- 
ties and gave a series of theorems and most compelling arguments for 
using as a best possible approximation his well-known lower bound, the 
squared multiple correlation. Recently Guttman (85) offered simple proofs 
of relations between communalities and multiple correlation properties. 
In another paper Guttman (82) proposed improved bounds for com- 
munalities, and more recently he (79) modified a well-known criterion to 
achieve communalities that maximize determinancy. 

In stipulating the composition of hypothetical variables beyond those in 
the immediate battery, Tryon (198) and Kaiser (115) attempted to calcu- 
late the squared multiple correlation in the limit. Their results appear to be 
somewhat encouraging. With no a priori specification of rank, they seemed 
able to obtain a solution for unknown unique minimum rank communali- 
ties. Additional work by Kaiser (112, 113) indicated that the Tryon- 
Kaiser approach breaks down when minimum rank communalities are not 
unique, a circumstance which will invariably arise in practice. 

From an examination of the ordinal properties of the factor-analytic 
model Bennett (15) set forth a method for determining the dimensionality 
of a score matrix, and Warburton (205) proposed a return to Hotelling’s 
original formulation concerning the use of unities as communality approxi- 
mations. In one of his final papers Thurstone (195) suggested that the 
arbitrary factoring of a correlation matrix be carried out according to the 
criterion that the factors to be extracted minimize the sum of the squares 
of the off-diagonal residual covariances. Altho not computationally feas- 
ible, this method of factoring does not require that communalities be 
known or even approximated; they fall out as an ex post facto byproduct 
of the factoring procedure. 

In relation to the frequently used statistical procedure of separating a set 
of data or scores into two or more parts, Harris (94) showed that in 
factoring, use of unities in the diagonal cells and employment of com- 
munalities leads to a separation of the data into two orthogonal parts. 
Subsequently Harris (93) was concerned with the development of a trans- 
formation procedure for deriving factors based on communality estimates 
from those obtained when unities are employed, a procedure which, when 
applied to observed data, yields common factors that are consistent with 
Lawley’s requirements for a maximum likelihood solution. 

In two truly heroic efforts Wrigley (211, 212), with the aid of an elec- 
tronic computer, attempted to follow thru on textbook recommendations 
that an ideal factoring procedure would be to compute iteratively principal 
axes for a given number of factors until the communality approximations 
converge. His results were not encouraging. With a correlation matrix of 
order 1], hundreds of successive principal axes solutions were necessary. 
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Not only did different solutions result from different initial approxima- 
tions, but also in the matrix selected for investigation the Heywood case 
(a value for a communality in excess of unity) occurred more often than 
not. In a nonmathematical article Wrigley (213) discussed in detail some 
distinctions between common and specific variance and concluded that the 
most nearly appropriate approximation to the communality of a test is 
indeed Guttman’s lower bound altho he indicated that it would be desirable 
to examine more closely Rao’s statistical procedures. 

From a purely algebraic viewpoint Guttman (86) considered the general 
question of rank reduction of correlation matrices when communalities 
are used. His most interesting finding was that there are classes of matrices, 
the rank of which may be reduced by only one—a development revealing 
that the implications of some earlier work by others on rank reduction 
are, in fact, false. 

After reviewing the major mathematical results on the communality 
problem thru 1956, Kaiser (112) concluded that the traditional minimum 
rank formulation of the problem is inadequate and that Rao’s recent 
statistical work (154) provides the essential missing ingredient to a prob- 
lem improperly considered algebraic. Kaiser also delineated the psycho- 
metric approach to the communality problem which he described as the 
drawing of inferences regarding a universe of psychological variables from 


a particular representative sample or battery of variables chosen from this 
universe. 


Rotation 


One of the most exciting developments in factor analysis during the 
period reviewed was the rise of analytic rotational criteria that yield unique 
and objective solutions as contrasted with the subjectivity of graphical pro- 
cedures. The first of these analytic criteria was proposed by Carroll (28), 
whose paper was reviewed in the December 1954 issue of the Review. 
Subsequently a large number of related papers appeared, including one by 
Thurstone (194) yielding a semianalytic solution that was also cited in the 
December 1954 issue of the Review. 

Neuhaus and Wrigley (141) developed what was referred to as the 
“quartimax” method in which it was suggested that the “distribution” 
of squared loadings should be maximum as a means of achieving simple 
structure. Simultaneously Saunders (169) presented his “maximum kurto- 
sis” criterion for orthogonal factors in which the kurtosis of the “distri- 
bution” of factor loadings and their reflections should be a maximum. 
Based on an analogous situation in information theory, Ferguson’s pro- 
posal (60) was that an ideally parsimonious factor matrix is one in which 
the sum of the fourth power of the loadings is maximized. Amazingly in 
the orthogonal case, the criteria of Carroll, Neuhaus and Wrigley, Saun- 
ders, and Ferguson are all mathematically equivalent. 

The problem of achieving an analytic criterion for the oblique case 
proved to be considerably more laborious. Because the form of his criterion 
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explicitly concerns interfactor relationships, the version proposed by Car- 
roll was perhaps the most easily generalized for nonorthogonal factors. 
Kaiser (114) developed a mathematical solution of Carroll’s criterion for 
the oblique case. Almost simultaneously Pinzka and Saunders (150) ef- 
fected a slight modification in Saunders’ orthogonal criterion that made an 
oblique solution possible. 


Noticing a systematic bias in the orthogonal quartimax solutions as a 
mathematical explication of simple structure, Kaiser (111) proposed his 
‘“‘varimax” method. He argued that the quartimax method, rather than 
simplifying the loading profile of a factor, concentrates its attention on 
simplifying the factorial composition of a test. Consequently, he revised 
the quartimax method by requiring that the variance of the squared load- 
ings be maximized by factors rather than by tests. His numerical results 
support his hypothesis. Later Kaiser (116) improved the psychometric 
invariance properties of his criterion thru carrying out rotations with 
respect to normalized loadings. By requiring that the sum of the co- 
variances of the squared loadings between all possible pairs of factors be 
minimized, Kaiser (111, 116) outlined the generalization of his varimax 
criterion to the oblique case. 


Probably the best intuitive explanation of the notion of simple structure 
yet to appear was given by Tucker (200) in an introduction to his most 
recent semianalytic procedure. An interesting variant of the simple struc- 
ture theme was presented by Schmid and Leiman (173) who proposed a 
“hierarchical ordering of factors.” Altho a typical solution contains several 
more factors than the traditional Thurstonian approaches that make use of 
simple structure, it reveals simultaneously both simple structure and bi- 
factor patterns while maintaining orthogonal axes. 


Factorial Invariance 


To the psychologist perhaps the final test of the efficacy of factor analysis 
for the scientific study of the internal structuring of correlated variables is 
the notion of factorial invariance. Ahmavaara (4) published a definitive 
monograph on the effect of selection. He provided methods of computing 
transformation matrices aimed at the problem of comparing different factor 
studies and applied his procedures in some detail to Thurstone’s classic 
studies of primary mental abilities. In his monograph Rasch (156) also 
considered the stability of factor loadings under changes in population. 
Another methodological paper on invariance was that of Barlow and Burt 
(12), who contended that two factors may be viewed as identical if, when 
the persons are the same, the two sets of factor-scores are identical, or if, 
when the tests are the same, the two sets of factor loadings are identical. 


Extending their earlier notion of “parallel proportional profiles” to the 
general case, Cattell and Cattell (31) were effectively concerned with find- 
ing the unique transformation matrix for rotational purposes on the basis 
of two experimentally independent studies of the same variables relative 
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to which the effect of selection is systematically varied. Another noteworthy 
paper was that of Wrigley and Neuhaus (214) who considered the match- 
ing of two sets of factors. 


Inverse Factor Analysis 


Interest in the problems of inverse factor analysis continued. In con- 
sidering conditions under which Q and R technics are convertible, Block 
(20) concluded that homogeneity of interaction between variables should 
exist for all individuals, and in another paper Block (19) argued in favor 
of forced rather than unforced Q-sorting procedure. Also comparing R and 
Q technics in the factor analysis of four groups of geometrical solids, 
Lorr, Jenkins, and Medland (126) found that a two-way factor analysis 
may be meaningful if not preferable to reliance on one method. How Q- 
sorts may be used as a rating technic in various educational and personnel 
settings was discussed in an expository paper by Morsh (137). 


Computational Procedures 


Computational savings in factoring relative to an existing simple struc- 
ture hypothesis were described in two papers by Horst (99, 100) and in 
one by Horst and Schaie (101). Rodgers (158) devised a fast approximate 
algebraic rotational procedure, and in inverse factor analysis Bass (14) 
reported a quick iterative means for clustering persons. 

Among other noteworthy developments in factor theory were two papers 
by Guttman (81, 84), in the first of which he described a generalized 
simplex for factor analysis and in the second of which he developed a 
simplified formula for matrix factoring. In a brief note Durand (55) ex- 
plained and illustrated the inversion of a matrix by a square root method. 


Miscellaneous Developments 


The determination of factor scores for individuals was the subject of a 
long and comprehensive paper by Guttman (80) who made a definitive 
examination of the problem. His interesting, tho not encouraging, findings 
revealed that for the same observed data alternative sets of such scores 
are usually available that exhibit little relationship to one another. 

Finally mention should be made of the important five-day international 
colloquium in factor analysis held in Paris during July 1955, under the 
auspices of the Centre National de la Recherche Scientifique and the Rocke- 
feller Foundation. The roster of 59 participants was internationally repre- 
sentative of different theoretical viewpoints and of such diverse fields of 
application as psychology, education, sociology, biology, and medicine. 

Published in a book entitled L’Analyse Factorielle et Ses Applications 
(42), the proceedings include 23 main papers, discussions, and resumés. 
Containing a singularly complete coverage of virtually the entire discipline 
of factor analysis, the volume affords critical comparisons of technics and 
rationales that would be of interest both to the unsophisticated and to the 
mature student in the field. 
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The theoretical papers, which comprise about half the volume, offer an 
intensive and extensive treatment of current variations in conceptual and 
methodological approaches to factor analysis. Several of the theoretical 
papers include excellent historical reviews concerning the development of 
factor analytic technics that serve to delineate in detail the empirical bases 
for the divergent practices among the British, the Continental, and the 
American groups. In short, the individual who is interested in acquiring 
an overview of the current research being done in factor analysis will 
probably find this volume to be the best single source. 
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CHAPTER IX 


Data Processing: Automation in Calculation 


CHARLES WRIGLEY* 


A WIDE variety of computational aids is currently available to educators, 
ranging from slide rule and desk calculator to punched-card equip- 
ment and electronic computation. In this review particular attention will be 
given to electronic computation, for several reasons: (a) The electronic 
computer will probably be the preferred aid in extensive calculations 
whenever available; (b) the machines require the more radical changes in 
research methods; and (c) the area is less known. But since much large- 
scale educational calculating is currently done with punched-card equip- 
ment, developments in that field will thereafter be briefly summarized. 
Much of the computational literature deals with topics, such as numeri- 
cal analysis, computer logic, computer engineering, scientific applications, 
and business applications, which possess no specific reference to education 
or psychology. Instead of trying to make a comprehensive survey of these 
fields, the policy will be followed of supplying certain key references, 
generally themselves well documented and of recent date, which provide 
starting points for any reader wishing to explore further. 
. Sections of the review are bibliographies, journals and professional 
societies, types of computers, digital computers, computer availability, pro- 
graming an electronic computer, numerical analysis, computer use in 
education and psychology, factor analysis, punched-card procedures, other 
computational aids, and the brain-machine analogy. 


Bibliographies 


Devoe (32) prepared an unpublished bibliography on the use of com- 
puters in psychology (45 titles). Abstracts of the general computer litera- 
ture appear in Mathematical Abstracts, Mathematical Tables and Other 
Aids to Computation, and IRE Transactions on Electronic Computers. The 
latter journal has an annual review of computer progress which includes 
extensive references (58, 81). The most comprehensive bibliography is the 
IBM one on use of punched-card machines in science, statistics, and educa- 
tion. The latest edition, compiled by Alsop, Flanagan, and Hankam (3) in 
1956, has 762 references. These pertain chiefly to conventional punched- 
card equipment, but there is a scattering of references relating to electronic 
computers. Casey and Perry (21) included a 276-item bibliography in 
their discussion of punched-card methods in science and industry. 

"© This manuscript was begun while the author was on the faculty of the University of California, 
Berkeley. Mr. James C. Lingoes of Michigan State University has given useful bibliographic assistance 


in the preparation of this review, and Dr. Steven G. Vandenberg of the University of Michigan has 
supplied helpful information. 
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Professional Societies, Journals, and Reference Works 


The first journal devoted specifically to computation was Mathematical 
Tables and Other Aids to Computation, first issued in 1943. The principal 
professional organization in the area is the Association for Computing 
Machinery, founded in 1947, which now has more than 2000 members. 

Three more recent journals relating specifically to electronic computa- 
tion are Computers and Automation, published since 1952; JRE Trans- 
actions on Electronic Computers, published since 1952; and the Journal: 
of the Association for Computing Machinery, published since 1954. (The 
ACM issued Proceedings in 1952 and 1953.) The JBM Technical News- 
letter is another useful source. “A Who’s Who in the Computer Field” (27) 


and “A Computer Directory and Buyers’ Guide” (26) have recently been 
prepared. 


Types of Computing 


Calculating machines may be divided into two classes, digital and analog. 
A digital machine operates directly with numbers, whereas an analog 
machine represents them by physical quantities, for example, intensity of 
electrical current or length of line. Thus an abacus is digital, whereas a 
slide rule is analogical. Educators have been concerned primarily with 
digital machines (altho the IBM test-scoring machine is analogical), and 
this emphasis will probably continue since analog computers, because they 
depend on measurement of physical quantities, are limited in accuracy. 
Hence only passing reference will be made here to the analog computer 
literature. Berkeley and Wainwright (12) compared digital and analog 
computers and discussed the advantages and disadvantages of each. An 
introduction to analog computer technics was prepared by Johnson (62) 
while Wadel and Wortham (109) listed the location of some of the prin- 
cipal analog computer installations. 

Digital computers can be divided into those with internal storage of 
programs (the orders which control the machine during the calculations) 
and those which rely upon a control panel for instructions. For this review, 
the term electronic computer will be confined to machines with internal 
program storage. Machines which rely upon control panels will be classified 
as punched-card equipment. Internally stored programing enables greater 
computational flexibility and higher speed. 


Digital Computers 


A wide range of books and articles is available dealing with design, 
programing, and scientific application of digital computers. Development 
has been so rapid that recent titles provide the best start on a review of the 
literature. Surveys by Wilkes (113) and Booth and Booth (14) supply 
useful introductions. A special issue of the Proceedings of the IRE (57) 
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was devoted entirely to electronic computation and covered a large variety 
of topics. Richards (90) was concerned with the basis in Boolean algebra 
of computer arithmetic. Bowden’s edited symposium (15), altho slightly 
dated, deserves mention on account of its third section dealing with com- 
puter applications since the scientific usefulness of the machines in a wide 
variety of fields is indicated. (There is, however, no specific mention of 
computer use in educational research. ) 

The original objective in computer design was to make feasible those 
calculations of such great length that they otherwise either could not be 
performed at all or only with prohibitive labor. More recently it has be- 
come clear that computers will be equally important in achieving more 
rapid processing of comparatively simple clerical tasks. Business applica- 
tions of computers range from maintaining a stock inventory to controlling 
airline reservations. This experience is relevant to educators because the 
same characteristics of large amounts of data and relatively simple cal- 
culations sometimes also pertain to educational research and record- 
keeping. Developments in commerce and industry have been reviewed by 
Kozmetsky and Kircher (67) and by Hattery and Bush (54). 


Computer Availability 


When Fattu surveyed the computational field in 1951, he concluded: “In 
the none too distant future it is likely that a substantial group of universi- 


ties and social science research organizations will have access to automatic 
computing equipment” (37: 425). This prediction has been substantiated 
in the ensuing six years. In 1951 there were few electronic computer in- 
stallations, and those in operation were principally devoted to national 
defense problems. Since there had been no educational or psychological 
use when Fattu prepared his review, his references to electronic computa- 
tion had to be drawn from the technical literature of engineering, physics, 
and applied mathematics. But the computer field was at that time on the 
threshold of rapid expansion, including commercial production of ma- 
chines. By an interesting coincidence, it was in December 1951, the month 
when Fattu’s review was published, that an electronic computer was first 
used in psychology (118). The machine was Ordvac, the first electronic 
computer built at the University of Illinois; it was made available for use 
by all University departments. 

The 1957 picture is greatly different. Over 1200 computers are reported 
to be in operation in the United States, and there are many installations in 
other countries (18). Computational centers have been established at a 
number of universities and research organizations. Machine speeds have 
increased as newer computers have become available, so that the terms high- 
and medium-speed must be regarded as relative. Here the term high-speed 
will arbitrarily be taken to apply to machines which multiply at a rate of 
more than 1000 times per second. Expansion has been so rapid that it is 
difficult to secure a complete list of universities and research organizations 
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with high-speed computers. Goode’s tabulation (41), altho only two years 
old, is already dated, and Weik’s survey (112) does not list all installations. 
Hence it has been decided to include a listing here, even if it proves incom- 
plete.* High-speed computers are installed at University of California, 
Berkeley (IBM 701); University of California, Los Angeles (IBM 705; 
SWAC); Cambridge (Edsac); Harvard (Univac) ; Illinois (Iliac); In- 
stitute for Advanced Study (IAS); Massachusetts Institute of Technology 
(IBM 704; Whirlwind); Michigan State (Mistic); Pennsylvania (Uni- 
vac); RAND Corporation (IBM 704; Johnniac) ; Sydney (Silliac). IAS, 
Illiac, Mistic, and Silliac are machines built to the same basic design. Iowa 
State College is building another machine of the class. 

Medium-size computer installations include: IBM 650’s at Armour Re- 
search Foundation, Boston University, Carnegie Institute of Technology, 
Case Institute of Technology, Cornell, Educational Testing Service, Florida, 
Georgia Institute of Technology, Houston, Indiana, Iowa State College, 
Kansas, Kentucky, Michigan, New York, North Carolina State College, 
Ohio State, Oklahoma, Oklahoma A. and M., Pittsburgh, Rochester, Stan- 
ford, Texas A. and M., Washington (Seattle), Washington (St. Louis), 
Washington State College, Wayne, and Wisconsin; Datatrons at California 
Institute of Technology, Chicago, Dayton, and Purdue; Ferrantis at Man- 
chester and Toronto; Univac Scientific 1103A at Johns Hopkins; Univac 
Scientific 1101 at Georgia Institute of Technology; Pennstac at Pennsyl- 
vania State; Udec at Wayne; Elecom at Stevens Institute of Technology; 
Alwac at the Adjutant-General’s Office; and CRC’s at the School of Avia- 
tion Medicine and the Naval Postgraduate School. It is not known how 
many of these installations are available for educational and psychological 
research. 


The role of a university computational center was discussed in a sympo- 
sium edited by Hammer (50). Consideration was given to the equipping 
and financing of a center; the decision whether to lease, buy, or build a 
machine; training of personnel; development of better numerical methods; 
organization of a numerical analysis and computational curriculum; and 
calculations characteristic of various departments. Alman (2) and Gotlieb 
(43) contributed other descriptions of university computational centers. 

As a guide to machine selection, Weik’s survey (112) gave speed of 
operation, size of storage, input-output mechanism, rental or purchase 
price, and the like, for 103 makes of digital computers; Carroll (19) com- 
pared some computers in commercial production; and Bauer (11) pro- 
vided machine specifications for half a dozen of the best known high-speed 
machines. Thomson, Harper, and Sawyer (99), in an American Psycho- 
logical Association symposium, discussed the experience of the Personnel 
Research Branch, Adjutant-General’s Office, in acquiring a computer. 
Recommendations as to the augmentation of the supply of trained man- 


* The author would appreciate being informed of any errors or omissions. 
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power (especially trained programers) were made by a Wayne University 
conference (61). Rowan (93) considered selection of programers by 
means of psychological tests. 


Programing—Directing an Electronic Computer 


Most people know by now that the principal difficulty in computer use 
is the writing of programs, that is, the sets of orders which control the 
machine during the calculations. McCracken (73) recently issued a book on 
digital computer programing. His concern was not with any particular 
machine, but with the general logic made applicable to a fictitious one. 
Interesting attempts were made by Ward (110) and Hamming (51) to 
express the general principles of programing in brief articles. 


Manuals giving programing instructions and order codes are available 
for most types of machines. A program for matrix multiplication presented 
by Cattell (23) in an appendix to his factor analysis textbook illustrates 
the formal organization. (It is also of interest as apparently the first 
program specifically prepared for psychological and educational use.) The 
programs for Edsac, the Cambridge computer, were reproduced by Wilkes, 
Wheeler, and Gill (114), and these authors rightly stressed the importance 
for any installation of developing and recording a program library as 
rapidly as possible. Frank (39) listed currently available Univac programs, 
and Wrigley (116) did likewise for Illiac. 

An outstanding need is for a complete list of IBM 650 programs of use 
in educational research. At present only partial lists seem to be available, 
for example, abstracts issued by the IBM Applied Science Customer 
Assistance Group (60) and libraries listed by the University of Washington 
(107) and the Datamatic Corporation (25). The IBM library includes pro- 
grams for analysis of variance, chi square, correlation, multiple correlation, 
phi coefficient, autocorrelation, matrix multiplication, matrix inversion, 
latent rods and vectors, and roots of algebraic equations. The University 
of Washington has programs for basic statistics (means, standard devia- 
tion, t-test, and the like), correlation, multiple regression, principal axes 
factor analysis, grade prediction, predictor selection, and optimal test 
time. The University of Wisconsin has a program for maximum likelihood 
factor analysis (52). The quartimax method of rotation in factor analysis 
was programed at Michigan, and the biquartimax method at Harvard (20). 

Andrew L. Comrey (University of California, Los Angeles) prepared 
and mimeographed various correlational and factor-analytic programs for 
SWAC; Jack Block (University of California, Berkeley) did likewise for 
the IBM 701; Kern W. Dickman (University of Illinois) for Illiac; James 
C. Lingoes (Michigan State University) for Mistic; and Jack O. Neuhaus 
(University of California, Berkeley) for the CRC 102-A. 

“Automatic” programing systems designed to leave as much of the labor 
of program organization as possible to the machine are now available 
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for a number of computers. For example, Poley and Mitchell (87) pre- 
pared a manual with the interesting name of SOAP (an abbreviation for 
Symbolic Optimal Assembly Program) which describes a mechanized sys- 
tem of assembly for a 650 program. 

Now that many organizations are using computers, better arrangements 
for program reporting and distribution are urgently needed. Programing 
is expensive: The cost has been estimated to average $2 per instruction, 
and the usual program has some hundreds of instructions (42). Hence 
cooperation is imperative if machines are to be used economically. Several 
groups have recently been formed to try to develop coordinated writing of 
and interchange of programs. A group known as SHARE (the Society to 
Help Avoid Repetitive Effort) is concerned with the IBM 704 (5) ; another 
group called USE (Univac Scientific Exchange) is fulfilling a similar 
function for the Univac Scientific 1103 A; and a Midwestern University 
Computer Users Committee is primarily concerned with the IBM 650. 


Numerical Analysis 


As might be expected, there has been a spate of books concerned with 
the theory and practice of numerical analysis. But methods continue to 
be discussed for the most part in terms of the desk calculator situation. 
From among the many methods available for most standard calculations, 
for example, solving algebraic equations, there is not yet consensus as to 
the most effective methods for electronic computations. 

Dwyer’s textbook (34) on linear computations probably remains the 
best single reference for the educator altho written with a desk calculator 
orientation. Rao (88) discussed various multivariate technics of potential 
importance to education, including some readily applied only upon an 
electronic computer. Textbooks covering the standard numerical course 
were written by Hildebrand (56) and Nielsen (83). Forsythe (38) and 
Luke (71) listed selected references in numerical analysis. 


Computer Use in Education and Psychology 


This section deals with the use of computers in educational and psycho- 
logical research. Some associated developments in statistical method and 
research design are included. Since considerable attention has been given 
to factor analysis in order to restate it in mechanizable form, it will form 
the subject of a separate section. 

Predictive studies are an obvious avenue for computer use. Previously 
the number of predictors has generally had to be limited to 10 or 12 on 
computational grounds. This restriction is removed by computers. Davison 
(29) used 34 predictors in her study of degree of frustration tolerance dis- 
played by nursery-school children. These included several product terms 
enabling weight to be given in the regression equation to interactions ob- 


533 





Review oF EpucATIONAL RESEARCH Vol. XXVIII, No. 5 





served between predictors. There is, of course, a greatly enhanced problem 
of shrinkage as the number of predictors is increased, but cross validation 
provides some protection until such time as the statistician can supply a 
really satisfactory answer. Simon (97) employed multiple criteria in a 
predictive study of Air Force mechanics. Each criterion measured a dif- 
ferent dimension of job performance; dimensions were selected by a factor 
analysis of the criteria. Merrill and Bennett (79) discussed the application 
of temporal correlation technics in psychology in terms of electronic com- 
puter potentialities. Leiman (69) described the distribution of nearly 
12,000 airmen to jobs in accordance with differential aptitude indexes and 
predetermined quotas. Lee (68) considered computer use in nonlinear 
multivariate prediction, and Ziegler (121) developed a computer proce- 
dure for determining biserial correlations of items with a criterion. 


Computers have been used for experimentation upon the arithmetical 
properties of sampling problems and probability chains. The principle of 
the Monte Carlo method (80) is to estimate a quantity by random sampling 
rather than approximate it by calculation. A computing program is devised 
to calculate a sequence of numbers satisfying statistical tests for random- 
ness. Its greatest usefulness probably applies to cases when there are 
stochastic processes. Tocher (100) discussed the application of computers 
to sampling experiments which can be imitated in the computation. Rosen- 
thal and Ferguson (91) used a pseudo-random sequence to determine the 
sampling distribution of the Friedman nonparametric test. Block (13), 
using empirical data (correlated items drawn from the Minnesota Multi- 
phasic Personality Inventory), examined the proportion of items which 
indicated two samples to be significantly different at some given probability 
level when in fact the population had been randomly divided into two 
parts. 

Two distinctly novel uses of computers were reported. Green (45) 
devised a procedure for preparing stimuli for form perception experiments 
with random elements introduced by generating displays on an oscilloscope 
by a computer program and photographing them on filmstrips. As a con- 
tribution to the rapid processing of, and decision making upon, a large 
amount of information (e.g., relative positions of many fast-moving air- 
craft), Rowan (92) used a computer, first to generate a simulated flight 
pattern, and then to synthesize the data and to make appropriate decisions 
on the best method for handling the situation. 

Tucker (105) seems to have been the first to consider the research 
potentialities of computers. He illustrated his discussion by proposing use 
of a computer for selecting test items with respect to their difficulty in such 
a way as to maximize validity. The suggested function was of such com- 
plexity that use of an electronic computer would be mandatory. Wrigley 
(116) summarized advantages and disadvantages of using computers in 
psychological research and indicated some likely changes in numerical 
practice and research design, 
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Factor Analysis 


Factor analysis has traditionally involved recourse to human judgment 
in a way quite foreign to any other standard psychometric technic. High- 
speed machines, however, have made it essential to rewrite factor analysis 
(or any other classificatory technic which seeks to supersede it) in com- 
pletely objective form. 


In a review of contemporary factor analysis, Cattell (24) devoted a sec- 
tion specifically to the effects of the new computational aids upon factor- 
analytic design. The need for more mechanizable procedures seems to have 
been largely responsible for the developments in communality estimation 
and in analytic methods of rotation already reviewed in Chapter VIII. 

When a computer is available, some mathematically more defensible 
method than the centroid can be used. The choice seems to lie between the 
Pearson-Hotelling principal axes method and the Lawley-Rao maximum 
likelihood method. Wrigley and Neuhaus (119) described the use of a ~ 
computer for calculating principal axes solutions. Harris and Peirce (52) 
described a maximum likelihood solution adapted to the IBM 650. Rao 
(89) developed the maximum likelihood method (which he preferred to 
call the canonical method of factor analysis) in an especially convenient 
form for electronic computation, thereby supplying factor analysis with 
a mathematically derived test for the significance of factors. The feasibility 
of the maximum likelihood method, now that computers are available, was 
illustrated by Lord (70) in his analysis of speed factors (a 39-variable 
study). Because of the double iteration, however, to determine both the 
communalities and the number of significant factors, the method is slow 
to converge unless initial estimates are reasonably accurate. 

Comrey and Levonian (28) used a computer to compare the factors 
derived from the use of three different coefficients of correlation (the 
tetrachoric correlation, the phi coefficient, and the corrected phi coefficient, 
i.e., phi divided by maximum phi) and concluded that the phi coefficient 
was more acceptable for factor-analytic work than generally believed. 
Wrigley and Dickman (117) used an index of factorial matching to study 
sampling variability of loadings when a sample of subjects is randomly 
divided into two. 

Better computational aids are making possible factor analysis of large 
sets of items. Osgood and Suci (85) used a computer in a factor analysis 
of adjectival scales to determine the dimensions of meaning. Comrey and 
Levonian (28) showed the practicability of determining the factorial 
structure of the items of as large an inventory as the MMPI by making 
centroid analysis of the principal scales in succession. Wrigley and others 
(120) factored a set of 200 dichotomous items measuring aircraft mechan- 
ics’ performance by modifying the square root method to enable systematic 
selection of pivot variables. Cattell (22) discussed the possibility of 
enhanced use of P-technic because of computational developments. -~ 
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The possibility of a completely objective and therefore completely 
mechanizable system of factor analysis was recently demonstrated by 
Wrigley (115). He proposed insertion of squared multiple correlations 
instead of communalities in the leading diagonal, analytic rotation of all 
factors with positive latent roots, dividing the sample randomly into two 
and repeating the procedure for each section, and then calculating indexes 


of matching for the two sets of factors to determine the replicability of 
each factor. 


Canonical analysis generalizes the multiple correlational procedure to 
the situation where there are multiple dependent variables, and each 
criterion dimension is to be predicted separately. Computers make this 
technic (along with much other disused multivariate algebra) practicable. 


Healy (55) reported a rotational method for computing canonical correla- 
tions. 


Other investigations have sought to replace factor analysis by some other 
technic taking more into account the patterns and clusters revealed by the 
data. Altho this configurational approach seems to accord with clinical 
experience in psychology, and also with holistic theoretical conceptions, the 
development of satisfactory statistical methods has been delayed by the 
computational labor of examining a multiplicity of patterns. McQuitty 
(77, 78) devised new technics for handling the configurational problem, 
named agreement analysis and linkage analysis respectively, which have 
been oriented toward use of an electronic computer. Likewise Tryon (104) 
modified his method of cluster analysis to make use of the new machines. 


Punched-Card Procedures 


Ten years ago a sharp distinction could be made between electronic 
computers and punched-card equipment. The latter included sorters, tabu- 
lators, reproducing punches, multiplying punches, and the like. The bound- 
ary is no longer so distinct. Electronic computers often have punched-card 
input and output, while multiplying punched-card units may have vacuum 
tubes for more rapid calculation. The dichotomy based upon whether or not 
programs are internally stored means that the IBM 604 and 607 are here 
classified as punched-card equipment. Their electronic components, how- 
ever, might entitle them to be classified with the electronic computers. 


General surveys of punched-card operations were prepared by Casey 
and Perry (21) and by Hartkemeier (53), while a manual was prepared 
outlining the fundamental principles, applicable to all types of punched- 
card machine, of control-panel wiring (59). Gruenberger (48) issued a 
computing manual based upon university use of punched-card equipment 
at the University of Wisconsin, and later supplied control-panel diagrams 
for some standard computations (49). Lunneborg, Wright, and Ax (72) 


published some plugboard diagrams prepared at the University of 
Washington. 
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Appel and Cooper (4), Bass and Wurster (10), and Deemer (30) con- 
sidered the use of mark-sense cards to eliminate card-punching and card- 
verifying. Traxler (102, 103) discussed the merits and demerits of the 
test-scoring machine, and Staugas (98) dealt with test scoring when item 
data were already punched on the cards. 

Correlational technics received a good deal of attention. Ayers and 
Stanley (8) described a rolling totals method for forming sums of squares 
and cross-products, while Burke (16) dealt with the special case when 
some numbers were negative. Procedures for solving multiple regression 
equations were presented by Allan and Attridge (1) and by Greenberger 
and Ward (46). The calculation of serial correlation coefficients was des- 
cribed by Payne and Staugas (86) and by Schipper and Gruenberger 
(94), and the computation of residual matrices in factor analysis was 
handled by Friedman and Ward (40). 

Various papers related to item analysis and test construction. MacLean 
and Tait (76) dealt with the computation of item and test means, variances, 
correlations, and item-selection indexes. A method for calculation of a 
joint occurrence matrix was described by Grace (44), while Caffrey and 
Wheeler (17) devised a new chi-square formula which could be more 
simply handled on punched-card equipment. DuBois, Loevinger, and Smith 
(33) devised a method called edge punching for calculating variances and 
covariances of dichotomous items. Farrell and Stern (36) reported on cal- 
culation of tetrachoric correlations; and Siegel and Cureton (96), on the 
calculation of biserial correlations for the evaluation of items. 

Kephart and Oliver (65) dealt with the scoring in the method of paired 
comparisons, and Kahn and Bodine (63) discussed Guttman scale analysis. 
The analysis of factorial experiments was considered by Bainbridge, Grant, 
and Radok (9), and the calculation of an uncertainty function in informa- 
tion theory by Newman (82). Katz (64) described the analysis of multiple- 
level sociometric data. Ward (111) illustrated the great saving in time 
made by an IBM 607 by reanalyzing data. 


Other Computational Aids 


Even in a period dominated by the emergence of the electronic com- 
puter, the simpler computational aids were not entirely neglected. Gruen- 
berger (47) advised on selection of a desk calculator, and Dwyer (35) 
considered the circumstances in which a desk calculator is to be preferred 
to punched-card equipment or electronic computers, Arnold (6) wrote a 
book on use of the slide rule. 


The Brain-Machine Analogy 


While there is general agreement that an electronic computer is very 
different in its operation from a human brain, the points of comparison 
and contrast continue to interest scholars and led to a provocative and 
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challenging series of papers. A symposium of the Institute of Radio 
Engineers (101) provides a good start for any reader wanting to pursue 
this line of investigation. Other important papers were written by Mc- 
Culloch (74), Turing (106), MacKay (75), Von Neumann (108), Wilkes 
(113), Ashby (7), and Shannon (95). Specific consideration was given to 
machine “learning” and machine “insight” by Oettinger (84) and Deutsch 
(31) respectively, while Kochen (66) considered “group behavior” of 
robots. 


Summary 


This review affords only a glimpse of the activity in the computational 
field from 195] to 1957. Many references could not be cited for lack of 
space, and much other computational development remains unpublished. 
By now many universities have computing centers, and educators are regu- 
larly working with calculating machines and becoming familiar with their 
strengths and limitations. But the field is in a transitional state. There 
remains a shortage of trained personnel; there is redundant programing 
and rather little interchange of information from one organization to 
another; program libraries are often inadequately reported; there has 
probably been too heavy an emphasis on the traditional procedures of 
multivariate analysis; and the full range of mathematical and statistical 
potentialities has hardly yet been explored. Fattu’s comment of 1951 


remains appropriate to a considerable extent in 1957: “Adapting these 
new technics to research remains.” But the foundations have been laid. 
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CHAPTER X 


Action Research: A Teaching or a Research Method? 


BERNARD R. CORMAN 


Crosine his 1953 review of action research, (the terms cooperative and 
cooperative action are favored currently), Wann (30) remarked that 
an action methodology was only beginning to emerge and that additional 
experimentation would be needed to make it practical for teachers to carry 
on research of a high quality. Since this major survey of the literature, the 
Review has carried a number of reports on the further development of the 
action-research line (11, 21, 29). It seems desirable, therefore, to evaluate 
these efforts as contributions to methodology as well as further to catalog 
the literature. As an organizing theme we are returning to Wiles’s earlier 
queries (32) on a distinction between action research and inservice train- 
ing. 

If we examine the writing about action research, it becomes apparent 
that some difference of opinion exists, even among those who are most 
enthusiastic, as to whether the action research approach does, in fact, 
constitute a new research departure. There are those who contrast action 
research with traditional or fundamental research (2, 3, 4, 5, 15). In its 
sharpest form this separation gives rise to statements to the effect that, 
instead of borrowing without change research methods from the sciences, 
research workers in education are now striking out on their own and 
building new skills (4). In a more moderate vein the distinction is made 
on the basis of the kinds of problems researched (1, 5, 25), on the adapta- 
bility of the research findings to real situations (5), on the motivation of 
the researchers (7), on the kinds of generalizations sought (15), on the 
intrinsic value of the research to the practitioner (12, 17), and most often 
and most importantly on who does the research (3, 5, 25). 

Do any of these distinctions, important as they are, provide the founda- 
tions for a new methodology? Our answer will depend, of course, on a 
definition. If we require, as a minimum condition, that a new methodology 
or technic give us a new way of organizing or analyzing phenomena so as 
to lead to the generation and test of new hypotheses or to fresh ways of 
testing old ones, it is questionable if any of the features central in the 
thinking of the action researchers make much of a contribution to research 
methodology (however great a contribution they do make to the inservice 
education of teachers). 

This is a rather sweeping judgment, but it can be tested in several ways. 
If, for one thing, we examine the three major recent books concerned with 
action research (3, 14, 22), we will find that once the questions of getting 
research used and of getting teachers to hypothesize cooperatively about 
their concerns are dealt with, the business of testing these hypotheses must 
be faced. And at this point the discussions are reduced to the employment 
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of the traditional methodologies and technics. This reduction is discerned 
by the writers. 

Others have seen the point as well. Thus Ahrens (1) spoke of the neces- 
sity of adapting the methods and procedures of the professional researchers. 
Shumsky (25) pointed out that the distinction between cooperative and 
individual research as separate processes was vastly overdrawn. Blum (5) 
saw the essential difference as one of the attitude of the researcher. 

Finally, to support the conclusion above, we can examine the production 
of the action-research teams. These are some of the hypotheses that have 
resulted: that spelling will be improved if the material is chosen from 
“real” material rather than from spellers (1); that students will benefit 
if common experiences, alternative methods of word attack, and oral 
experiences precede reading (28) ; that a survey will identify the nature of 
the remedial reading problems in a school (17); that favored reading 
material depends on content rather than length (18). 

The point is not at all that these hypotheses are trivial, for any method 
may be used to research trivia, and, moreover, there is no reason for 
questioning the authors when they state that the hypotheses were important 
to the teachers doing the research. The point is, rather, that these are hypo- 
theses which could have been arrived at thru any of the traditional methods. 
The argument is even sharper when we examine the technics by which the 
hypotheses were tested. Checklists, case studies, tests of means, and similar 
devices were employed. As a contrast, compare the jump forward in the 
nature of the hypotheses which were made testable by the introduction of 
a technic like analysis of variance, for example (27). 

What is at issue is not the purity of research. Rather it seems important 
to achieve clarity about the nature of action research in order to protect the 
positive contribution which the approach has made. 

Common to many of the reported action researches are statements to the 
effect that teachers found cherished prejudices challenged, that leadership 
was developed, that lines of communication became clearer, that interest 
in research was engendered, and that curriculum change was facilitated 
(1, 13, 14, 16, 17, 18, 19, 24, 28). There are reports of success in incor- 
porating the approach both in the training of teachers and in the teaching 
of public-school classes (10, 13, 20, 22, 26, 31). Any movement which will 
encourage a turn toward problem solving in teacher education needs to be 
nourished. This, it would appear, is the distinctive contribution which 
action research does make. 

The emphasis on action research as a separate and distinctive modern 
research methodology to be contrasted with traditional methodologies 
seems self-defeating of this positive contribution. Such emphasis introduces 
conceptions which will not produce warranted evidence either about 
practical or impractical problems. One example is the conception that 
research findings become valued only at the point where the teacher repli- 
cates them (12). Strictly read, such a conception negates the possibility of 
learning anything from the research efforts of others. We would find our- 
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selves asking questions which have been asked and answered many times 
before, and each of us would be put in the position of accepting only the 
answers obtainable with our own technical skill (3). Stemming from this 
viewpoint is a revival of the conception that there is some fundamental 
difference between applied and pure research, which takes expression in 
repeated statements that action researchers are somehow not interested in 
generalization, but only in whether or not teachers are accomplishing 
the things that they hope to accomplish (4, 7). But as Prewett (23) sug- 
gested, if our goal is to channel teachers into research rather than research 
into teachers, we need more rather than less concern for theory. 

The principal danger which results from a confusion of the teaching and 
research functions of action research is the justification for the relaxation 
of elementary safeguards of the warranty of evidence which results. It is 
one thing to defend oversimplified hypotheses (3), to overlook necessary 
controls (28), to encourage changing hypotheses in midstream (5), to 
ignore the problem of reliability in stating that teachers are best fitted to 
know what is going on in their classrooms (3), and so on, if our purpose 
is simply to get teachers to engage in group processes or to introduce them 
to problem-solving methods. It is something else again if we defend these 
practices as somehow contributing to the production of valid inferences. 

The original point of departure of action research was the failure of 

educational research to play a significant role in changing practice (6, 8). 
The answer found was to change the research personnel, to involve class- 
room teachers more directly in the research. If this is to be achieved, it will 
mean that teachers will have to take the time and effort to acquire the 
necessary tools, and that rather than striking out on their own, more time 
will have to be given in the teacher-education program for the study of 
methods of inquiry. Corey (9) spelled out some of the implications of the 
action-research movement for the teacher-education program. 
' Finally, note has to be taken of Blum’s argument (5) that the action- 
research movement is a revolt against the separation of fact and values. If 
educational research, whether done cooperatively or noncooperatively, by 
action or in action, by professional or amateur, is to be valued, it is 
necessary that the researchers make explicit the preferences that undergird 
their efforts, and tackle the problems which are most pressing rather than 
those which are most convenient. 
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Lord, F. E., Coordinator of Special Education, Los Angeles State College of Applied 
Arts and Science, 5280 Gravois Street, Los Angeles 32, California. 

Loretan, Joseph O., Associate Superintendent, 110 Livingston Street, Brooklyn 1, 
New York. 

Lorge, Irving, Professor of Education, Teachers College, Columbia University, New 
York 27, New York. 

Lourenco-Filho, M. B., Full Professor, Educational Psychology, University of 
Brazil, Rua Pedro Guedes 56, Rio de Janiero, Brazil. 

Lovejoy, Philip C., P.O. Box 985, Castalia, Ohio. 

Lucio, William H., Assistant Dean, School of Education, University of California, 
Los Angeles 24, California. 

Lucow, William Harrison, Associate Professor, Faculty of Education, University of 
Manitoba, Winnipeg, Manitoba, Canada. 

Luker, Arno H., Professor of Educational Psychology and Guidance, Colorado State 
College of Education, Greeley, Colorado. 

Lund, Kenneth W., Assistant to the General Superintendent, Chicago Public Schools, 
228 North LaSalle Street, Chicago 1, Illinois. 

Luther, Gertrude Hawkins, Bureau of Education Research, Board of Education, 
Cleveland, Ohio. 

Lyle, Mary S., Professor of Home Economics Education, Home Economics Building, 
Iowa State College, Ames, Iowa. 

Lyman, Howard B., Assistant Professor of Psychology, Department of Psychology, 
University of Cincinnati, Cincinnati 21, Ohio. 

§ Lyon, Don O., Graduate Assistant in School Administration, 300 South Forest, 
Vermillion, South Dakota. 

+ MeCall, William A., 433 Valencia Avenue, Coral Gables, Florida. 

— Lloyd E., Associate Professor of Education, Butler University, Indianapolis, 
ndiana. 

McClelland, James N., Manager, Publications and Training Service, Northrop Air- 
craft, Inc., Hawthorne Field, Hawthorne, California. 

McClure, Worth, Executive Secretary Emeritus, American Association of School 
Administrators, 2122 California Street, N. W., Washington 8, D. C. 

McClurkin, W. D., Director, Division of Surveys and Field Services, George Peabody 
College for Teachers, Nashville 4, Tennessee. 

McCluskey, Howard Y., Professor of Educational Psychology, School of Education, 
University of Michigan, Ann Arbor, Michigan. 

McConagha, Glenn Lowery, Administrative Vicepresident, Muskingum College, New 
Concord, Ohio. 

MacConnell, James D., Associate Professor of Education, Stanford University, 
Stanford, California. 

McConnell, T. R., Professor of Higher Education, University of California, Berkeley 
4, California. (President of AERA, 1941-42.) 

Macdonald, James B., Apartment 4, 2302 Enfield Road, Austin 3, Texas. 

McDonald, Richard J., Director of Research and Guidance, Corning City School 
District, 291 East First, Corning, New York. 

McFall, Kenneth H., Provost, Bowling Green State University, Bowling Green, Ohio. 

—* John W., Superintendent, Vernon Public Schools, Box 1618, Vernon, 
exas. 
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MecGauvran, Mary E., Dean of Women, and Professor of Education, State Teachers 
College, Lowell, Massachusetts. 

McGuire, Carson, Professor of Educational Psychology (Human Development), 
The University of Texas, Austin 3, Texas. 

McKeachie, Wilbert J., Associate Professor of Psychology, University of Michigan, 
Ann Arbor, Michigan. 

McKenna, Bernard H., Associate Executive Secretary, Metropolitan School Study 
Council, Teachers College, Columbia University, 525 West 120th Street, New York 
27, New York. 

Mackenzie, Gordon N., Professor, Teachers College, Columbia University, New 
York 27, New York. 

McKillop, Anne Selley, Associate Professor of Education, Teachers College, 
Columbia University, New York 27, New York. 

McKim, Margaret G., Professor of Education, Teachers College, University of 
Cincinnati, Cincinnati 21, Ohio. 

Mackintosh, Helen K., Chief, Elementary Schools Section, Division of State and 
Local School Systems, U. S. Office of Education, Department of Health, Education, 
and Welfare, Washington 25, D. C. 

McKone, Frederick W., Associate Professor, Teachers College of Connecticut, New 
Britain, Connecticut. 

+McLaughlin, Katherine L., Retired Professor of Education, University of Cali- 
fornia, Los Angeles, California. 

McLaughlin, Kenneth F., Associate Professor and Director of University Test 
Service, Florida State University, Tallahassee, Florida. 

McLure, William P., Director, Bureau of Educational Research and Professor of 
Education, College of Education, University of Illinois, Urbana, Illinois. 

McManus, R. Louise, Director, Division of Nursing Education, Teachers College, 
Columbia University, 525 West 120th Street, New York 27, New York. 

MeNally, Harold J., Professor of Education, Teachers College, Columbia University, 
525 West 120th Street, New York 27, New York. 

McQuagge, Carl L., Associate Professor of Educational Administration, Mississippi 
Southern College, Box 85, Station A, Hattiesburg, Mississippi. 

McSwain, E. T., Dean, School of Education, Northwestern University, Evanston, 
Illinois. 

Mack, Esther, Assistant Professor, School of Education, State College of Washington, 
Pullman, Washington. 

— I. N., Counseling and Guidance (Private), 620 Sixth Avenue, Lewiston, 

aano. 

Mallinson, George Greisen, Dean, School of Graduate Studies, Western Michigan 
University, Kalamazoo, Michigan. 

Maney, Ethel Swain, 121 Montgomery Avenue, Bala Cynwyd, Pennsylvania. 

Manolakes, George, Associate Professor of Education, New York University, Wash- 
ington Square, New York 3, New York. 

Manuel, Herschel T., Professor of Educational Psychology, The University of Texas, 
Austin 3, Texas. . 

Manwiller, Charles E., Director of Curriculum Study and Research, Pittsburgh Public 
Schools, Administration Building, Pittsburgh, Pennsylvania. 

Marriott, John C., Research Engineering Staff, Ford Motor Company, P.O. Box 
2053, Dearborn, Michigan. 

Martin, Miss Clyde I., Associate Professor of Curriculum and Instruction, College 
of Education, The University of Texas, Austin 3, Texas. 

Martin, Edwin D., Assistant Superintendent, Secondary Education, Houston Public 
Schools, Houston, Texas. 


Martin, W. Howard, Associate Professor, Agricultural Education, University of 
Connecticut, Storrs, Connecticut. 

Mason, John M., Associate Professor, Teacher Education, College of Education, 
Michigan State University, F-12 Wells Hall, East Lansing, Michigan. 

Masters, Harry V., President, Albright College, Reading, Pennsylvania. 

Mathews, Chester O., Professor of Education and Director, Evaluation Service, 
Ohio Wesleyan University, Delaware, Ohio. 
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Mathis, Byron Claude, Assistant Professor, School of Education, Northwestern 
University, Evanston, Illinois. 

Matteson, Ross W., Associate Professor, Counseling Center, Michigan State Univer- 
sity, East Lansing, Michigan. 

Matthews, Joseph L., Assistant Director, Division of Extension Research and 
Training, Federal Extension Service, U. S. Department of Agriculture, Washington 
=, a <. 

Maucker, J. William, President, Iowa State Teachers College, Cedar Falls, Iowa. 

Maul, Ray C., Assistant Director, Research Division, National Education Association, 
1201 Sixteenth Street, N. W., Washington 6, D. C. 

Maw, Wallace H., Associate Professor of Education, University of Delaware, Newark, 
Delaware. 

Maxwell, J. S., Associate Professor of Education and Principal, University Laboratory 
School, 212 Education Building, Columbia, Missouri. 

Mayo, Samuel T., Assistant Professor of Education, Loyola University, 820 North 
Michigan Avenue, Chicago 11, Illinois. 

Mayor, John R., Director of Education, American Association for the Advance- 
ment of Science, 1515 Massachusetts Avenue, N. W., Washington 5, D. C. 

+t Mead, Arthur R., Director Emeritus of Educational Research, College of Education, 
yoo of Florida, Gainesville, Florida. (1719 N. W. Sixth Avenue, Gainesville, 

orida. 

Mech, Edmund V., Assistant Professor, Psychology Department, Pennsylvania State 
University, 205 Burrowes Building, University Park, Pennsylvania. 

Meder, Elsa M., Associate Editor, Educational Department, Houghton Mifflin 
Company, Boston, Massachusetts. 

Medley, Donald M., Assistant Professor of Education, Division of Teacher Education, 
Municipal Colleges of New York City, 535 East 80th Street, New York 21, New York. 

Meece, Leonard E., Professor, Educational Administration, University of Kentucky, 
Lexington, Kentucky. 

Melcher, George, Superintendent Emeritus, Kansas City Public Schools, Kansas City, 
Missouri. (Secretary-Treasurer of AERA, 1915-18.) 

Melville, S. Donald, Associate Director, Cooperative Test Division, Educational 
Testing Service, 20 Nassau Street, Princeton, New Jersey. 

Meredith, Cameron W., Professor of Psychology, State University of New York, 
Teachers College, Oswego, New York. 

Merenda, Peter F., Research Psychologist, 1701-70th Street, Kenosha, Wisconsin. 

Merigis, Harry J., Director, Laboratory School, Eastern Illinois University, Charles- 
ton, Illinois. 

Merwin, Jack C., Assistant Professor of Education, Syracuse University, Syracuse 10, 
New York. 

Michael, William B., Director, Testing Bureau, and Professor of Psychology and 
Education, University of Southern California, Los Angeles 7, California. 

Michaelis, John U., Professor of Education, Department of Education, University of 
California, Berkeley 4, California. 

Miles, John R., Manager, Education Department, Chamber of Commerce of the 
United States, 1615 H Street, N. W., Washington 6, D. C. 

Miles, Matthew B., Assistant Professor of Education, Horace Mann—Lincoln Institute 
of School Experimentation, Teachers College, Columbia University, New York 27, 
New York. 

Miles, Vaden W., Professor of Physics, Wayne State University, Detroit 2, Michigan. 

Millard, Cecil V., Director, Child Development Laboratory, Michigan State University, 
East Lansing, Michigan. 

Miller, Lebern N., Associate Professor of Education, Department of Education, 
University of Tulsa, Tulsa 4, Oklahoma. 

Miller, Murray Lincoln, Director of Audio-Visual Education, Illinois State Normal 
University, Normal, Illinois. 

Minnich, A. E., Director of Personnel, Middletown City Schools, 1410 First Avenue, 
Middletown, Ohio. 

Mitchell, Mrs. Blythe C., Editor, Division of Test Research and Service, World Book 
Company, Yonkers-on-Hudson, New York. 
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Mitchell, Guy C., Professor of Education and Director of Graduate Studies, Mis- 
sissippi College, Clinton, Mississippi. 

Mitzel, Harold E., Acting Director, Office of Research and Evaluation, Division of 
Teacher Education, Municipal Colleges of New York City, 535 East 80th Street, 
New York 21, New York. 

Moffitt, Mrs. Mary W., Assistant Professor, Queens College, Kissena Boulevard, 
Flushing, New York. 

§ Moldstad, John A., Assistant Professor of Education and Assistant in Research, 
Audio-Visual Center, Indiana University, Bloomington, Indiana. 

** Monroe, Walter S., Distinguished Professor of Education Emeritus, University 
of Illinois. (211 South Castanya Way, Menlo Park, California.) (President of AERA, 
1916-17; Editor, Encyclopedia of Educational Research 1940 and 1950 editions.) 

Moonan, William J., Director, Statistical Division, USN Personnel Research Field 
Activity, San Diego 52, California. 

+ Moore, Clyde B., Professor Emeritus of Education, School of Education, Cornell 
University, Ithaca, New York. 

Moore, Harold E., Director, School of Education, University of Denver, Denver 10, 
Colorado. 

Moore, Walter J., Associate Professor, College of Education, University of 
Illinois, Urbana, Illinois. 

Morgan, Barton, Professor of Education, Iowa State College, 220 Curtiss Hall, 
Ames, Iowa. 

Morgan, Walter E., Chief, Dependents Schools Branch, HQ Pacific Air Forces, APO 
953, San Francisco, California. 

Moriarty, Mary J., Professor of Education, State Teachers College, Bridgewater, 
Massachusetts. 

Mork, Gordon M. A., Associate Professor of Education, University of Minnesota, 
Minneapolis 14, Minnesota. 

Morneweck, Carl D., Director of Research, Department of Public Instruction, 
Commonwealth of Pennsylvania, Education Building, Harrisburg, Pennsylvania. 
Morphet, Edgar L., Professor of Education, University of California, Berkeley 4, 

California. 

Morrison, Mrs. Harriet Barthelmess, Consulting Psychologist, Derry, New Hamp- 
shire. 

Morrison, J. Cayce, Director, The Puerto Rican Study—The Education and Adjust- 
ment of Puerto Ricans in New York City, New York. (President of AERA 1929-30; 
Chairman, Editorial Board, Review of Educational Research, 1943-1948; Chairman, 
Board of Editors, Encyclopedia of Educational Research, 1947-1949, 1956-_ .) 

Mort, Paul R., Richard March Hoe, Professor of Education, Teachers College, 
Columbia University, New York 27, New York. (President of AERA, 1951-52.) 

Morton, R. L., Professor of Education, The Ohio University, Athens, Ohio. (Vice- 
president of AERA, 1931-32.) 

Mosier, Earl E., Assistant Commissioner for Higher Education, Department of 
Education, Trenton, New Jersey. 

Moulton, John K., Teacher, Brookline High School, 115 Greenough Street, Brookline, 
Massachusetts. 

.. George Joseph, Professor of Education, University of Miami, Coral Gables, 

orida. 

Moynihan, The Rev. James F., SJ, Professor of Psychology and Education, Boston 
College, Chestnut Hill 67, Massachusetts. 

Mullen, Frances A., Assistant Superintendent of Schools, Board of Education, 228 
North LaSalle Street, Chicago 1, Illinois. 

Munves, Mrs. Elizabeth D., Assistant Professor, School of Education, New York 
University, Washington Square, New York 3, New York. 

— Bernard C., Head, Department of Psychology, Wesleyan College, Macon, 

eor 

Murphy, Alton C., Associate Professor, Educational Psychology, and Director, Exten- 
sion Teaching and Field Service Bureau, Extension Building 201, The University 
of Texas, Austin 3, Texas. 
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Murphy, Harold D., Assistant Director of Student Personnel, East Texas State Col- 
lege, East Texas Station, Commerce, Texas. 

Murphy, Helen A., Professor of Education, School of Education, Boston University, 
332 Bay State Road, Boston 15, Massachusetts. 

Myers, Charles T., Associate in Test Development, Educational Testing Service, 20 
Nassau Street, Princeton, New Jersey. 

Myers, Garry C., Editor, Highlights for Children, Boyds Mills, Wayne County, 
Pennsylvania. 

Myers, Spencer W., Superintendent, Flint Public Schools, Oak Grove Campus, East 
Kearsley and Crapo Street, Flint 3, Michigan. 

Nally, Thomas P. F., Associate Professor, nant of Education and Psychology, 
University of Rhode Island, Kingston, Rhode Island. 

Naslund, Robert A., Associate Professor of Education, University of Southern 
California, Los Angeles 7, California. 

Nason, Doris E., Assistant Professor of Education, University of Connecticut, Box 
U-33, Storrs, Connecticut. 

Natale, Joseph Paul, Assistant Director, Department of Employee Services, Denver 
Public Schools, 414 Fourteenth Street, Denver 2, Colorado. 

Nathanson, Jules L., Director of Research, Hartford Public Schools, 249 High Street, 
Hartford, Connecticut. 

Navarra, John G., Associate Professor of Science, East Carolina College, Greenville, 
North Carolina. 

Nelson, Clarence H., Office of Evaluation Services, Michigan State University, East 
Lansing, Michigan. 

Nelson, Kenneth G., Director, Training Research Division, U. S. Naval Personnel 
Research Field Activity, 19th and East Capitol Streets, N. W., Washington 25, D. C. 

Nelson, M.' J., Dean of the College and Dean of Instruction, Iowa State Teachers 
College, Cedar Falls, Iowa. 

+ Nelson, Milton G. (Retired) , 166-19th Avenue North, Lake Worth, Florida. 

Wemsamee Claude L., Chairman, Education Department, University of Detroit, Detroit, 
Michigan. 

Nerbovig, Marcella H., Associate Professor of Education, Northern Illinois University, 
DeKalb, Illinois. 

Nesi, Carmella, Principal, Junior High School 143, 120 West 23lst Street, Bronx 63, 
New York. 

Netzer, Royal F., President, State University Teachers College, Oneonta, New York. 

Newell, Clarence A., Professor of Educational Administration, College of Education, 
University of Maryland, College Park, Maryland. 

Newland, T. Ernest, Professor of Education, University of Illinois, Urbana, Illinois. 

— ogg H., Professor of Education, Michigan State University, East Lansing, 
Michigan. 

Nolstad, Arnold R., Associate Professor of Mathematics, North Carolina State 
College, Raleigh, North Carolina. 

North, Robert D., Jr., Assistant Director, Educational Records Bureau, 21 Audubon 
Avenue, New York 32, New York. 

Northby, Arwood S., Director, Division of Student Personnel, University of Con- 
necticut, Storrs, Connecticut. 

Norton, John K., Director, Division II—Administration and Guidance, Teachers 
my = Columbia University, New York 27, New York. (President of AERA, 
927-28.) 

Nothern, E. F., Assistant Professor of Education, University of Kansas, 307 Bailey 
Hall, Lawrence, Kansas. 

O’Brien, Cyril C., Associate Professor, Department of Education, Marquette Univer- 
sity, Milwaukee 3, Wisconsin. 

— F. P., Professor Emeritus of Education, University of Kansas, Lawrence, 
ansas. 

Odell, C. W., Professor of Education, University of Illinois, Urbana, Illinois. 


Ohlsen, Merle M., Professor of sieaciiee College of Education, University of Illinois, 
Urbana, Illinois. 
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Ojemann, Ralph J., Director, Preventive Psychiatry Research Program, State 
University of Iowa, Iowa City, Iowa. 

O’Kelley, G. L., Jr., Specialist in Teacher Training & Service Studies, Agricultural 
Education Branch, U. S. Office of Education, Department of Health, Education, 
and Welfare, Washington 25, D. C. 

Oliverio, Mary Ellen, Assistant Professor of Education, Teachers College, Columbia 
University, 525 West 120th Street, New York 27, New York. 

Olson, Willard C., Dean, School of Education, University of Michigan, Ann Arbor, 
Michigan. (President of AERA, 1948-49.) 

Oppenheimer, J. J., Head, Department of Education, University of Louisville, Louis- 
ville 8, Kentucky. 

Orata, Pedro T., Programme Specialist, UNESCO, 19 Avenue Kleber, Paris 16, 
France. 

Orr, David B., Associate Research Scientist, American Institute for Research, 6135 
Kansas Avenue, N. E., Washington 6, D. C. 

Orshansky, Bernice, Research Psychologist, USAF Continental Air Command, 
2200th Test Squadron, Mitchel Air Force Base, Hempstead, New York. 

Osborne, R. Travis, Director, Guidance Center and Associate Professor of Psychology, 
University of Georgia, Athens, Georgia. 

Ostreicher, Leonard M., Personnel Coordinator, The Martin Company, Baltimore 
3, Maryland. 

Otto, Henry J., Graduate Professor of Elementary Administration and Curriculum, 
The University of Texas, Austin, Texas. 

Owings, Ralph S., Head and Professor of Educational Administration, Mississippi 
Southern College, P. O. Box 27, Station A, Hattiesburg, Mississippi. 

Pace, C. Robert, Chairman, Psychology Department, Syracuse University, Syracuse 
10, New York. 

Parke, Margaret B., Associate Professor, Brooklyn College, Brooklyn 10, New York. 

Parres, John G., Director, Research and Publications, State Department of Public 
Instruction, State House Annex, Dover, Delaware. 

Parsons, R. B., Professor of Education, Murray State College, Murray, Kentucky. 

Passow, A. Harry, Associate Professor of Education and Research Associate, Horace 
Mann-Lincoln Institute of School Experimentation, Teachers College, Columbia 
University, New York 27, New York. 

Pate, Evelyn Rebecca, Vice-Dean, Division of Home Economics, Oklahoma State 
University, Stillwater, Oklahoma. 

Patterson, Franklin K., Lincoln Filene Professor of Civic Education, Tufts Univer- 
sity, Medford 55, Massachusetts. 

Patterson, Gordon E., Director of Research, Jefferson County, Colorado, School Dis- 
trict R-1, 1580 Yarrow Street, Lakewood 15, Colorado. 

Patterson, William Rex A., Director of Guidance, Arcadia High School, 180 W. 
Huntington Place, Arcadia, California. 

Pattishall, Evan G., Jr., Associate Professor of Educational Research, University of 
Virginia, Charlottesville, Virginia. 

Pattison, Mattie, Professor of Home Economics Education, Iowa State College, Ames, 
Iowa. 

+ Paul, Joseph B. (Retired), Warren, Indiana. (Formerly Director, Bureau of Re- 
search, lowa State Teachers College, Cedar Falls, Iowa.) 

Pauly, Frank R., Director of Research, Tulsa Public Schools, Box 131, Tulsa, Okla- 
homa. 

Payne, Joseph C., Educational Research Consultant, Indianapolis Public Schools, 
150 North Meridian Street, Indianapolis 4, Indiana. 

Perdew, Philip W., Professor of Education, University of Denver, Denver 10, Colo- 
rado. 

Perloff, Robert, Director of Test Research, Science Research Associates, 57 West 
Grand Avenue, Chicago 10, Illinois. 


Perry, Raymond C., Professor of Education, University of Southern California, Los 
Angeles 7, California. 
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+ Perry, Winona M., Emeritus Professor of Educational Psychology and Measure- 
ments, University of Nebraska. (Mailing address: 92 Whitmarsh Street, Providence 
7, Rhode Island.) . 

Peters, Herman J., Associate Professor of Education, Armory 4, The Ohio State 
University, Columbus 10, Ohio. 

Peterson, Elmer T., Dean, College of Education, State University of Iowa, Iowa City, 
Iowa. 

Peterson, LeRoy, Coordinator of Extension, Milwaukee Area, University Extension 
Division, University of Wisconsin, 1106 Wisconsin Tower Building, 606 West 
Wisconsin Avenue, Milwaukee 3, Wisconsin. 

Peterson, Shailer, Secretary, Council on Dental Education, American Dental Asso- 
ciation, Chicago, Illinois. 

Petty, Walter T., Assistant Professor of Education, Sacramento State College, 6000 
J Street, Sacramento 19, California. 

Pflieger, Elmer F., Coordinator, Television Teaching Project, Detroit Public Schools, 
9345 Lawton, Detroit 6, Michigan. 

Phay, John E., Professor of Education, Director of the Bureau of Educational Re- 
search and Director of Summer Session, University of Mississippi, University, 
Mississippi. 

Phillips, Beeman N., Assistant Professor of Educational Psychology, Department of 
Educational Psychology, The University of Texas, Austin, Texas. 

Phillips, Murray G., Coordinator of Instructional Materials, Garden City Public 
Schools, 61 Hilton Avenue, Garden City, New York. 

Phipps, George C., Principal, J. N. Thorpe School, 8914 Buffalo Avenue, Chicago, 
Illinois. 

Pierce, Truman M., Dean, School of Education, Alabama Polytechnic Institute, 
Auburn, Alabama. 

Pindell, Watson F., Specialist in Research, Department of Education, 3 East 25th 
Street, Baltimore 18, Maryland. 

Pirie, Duncan A. S., Head, Exact Sciences Department, Detroit Public Schools, 4628 
Devonshire Road, Detroit 24, Michigan. 

Pitkin, Fred E., Research Director, Massachusetts Teachers Association, 14 Beacon 
Street, Boston 8, Massachusetts. 

Pitt, Clifford C., Associate Professor, Ontario College of Education, University of 
Toronto, 371 Bloor Street, West, Toronto 5, Ontario, ‘Canada. 

Plumlee, Lynnette B. (Mrs. R. H.), Personnel Research and Testing, Sandia Cor- 
poration, Albuquerque, New Mexico. 

Polansky, Lucy, Assistant Professor of Education, Queens College, Flushing 67, New 
York. 

Polley, John W., Professor of Education, Teachers College, Columbia University, 
New York 27, New York. 

Polster, Arthur H., Assistant Superintendent, Sacramento City Unified School Dis- 
trict, P. O. Box 2271, Sacramento 10, California. 

Popham, W. James, Institute of Educational Research, Indiana University, Rogers- 
K, Bloomington, Indiana. 

§ Porter, Douglas, Instructor in Education, Harvard University, Graduate School of 
Education, Laboratory for Research in Instruction, 7 Kirkland Street, Cambridge 38, 
Massachusetts. 

Porter, Gerald A., Professor of Education, University of Oklahoma, Norman, Okla- 
homa. 

Porter, Robert M., Associate Professor of Education, State University Teachers Col- 
lege, Oneonta, New York. 

Potter, Mary A. (Retired), 1533 College Avenue, Racine, Wisconsin. (Address: 
October 15 to May 15, Route 1, Stuart, Florida.) (Formerly Consultant in Mathe- 
matics, Board of Education, City Hall, Racine, Wisconsin.) 

Potter, Muriel C. (Mrs. Harry Langman), Associate Professor of Education, East- 
ern Michigan College, Ypsilanti, Michigan. 

Potthoff, Edward F., Director, Bureau of Institutional Research, University of IIli- 
nois, 1114 West Green Street, Urbana, Illinois. 
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Pounds, Ralph L., Professor of Education, Teachers College, University of Cincin- 
nati, Cincinnati 21, Ohio. 


foe Jackson O., Dean, College of Education, University of Wichita, Wichita 14, 

ansas. 

Powell, Marvin, Assistant Professor of Education, Division of Education, Western 
Reserve University, Cleveland 6, Ohio. 

Pratt, Edward, Associate Professor of Education, School of Education, Southern 
Methodist University, Dallas 5, Texas. 

Prentis, Roy C., Assistant Professor, College of Education, University of Minnesota, 
224 Burton Hall, Minneapolis 14, Minnesota. 

Prescott, George A., Director of Guidance and Special Services, Norwalk Public 
Schools, Norwalk, Connecticut. 

res Sidney L., Professor of Psychology, Ohio State University, Columbus 10, 

10. 


Preston, Ralph C., Professor of Education, University of Pennsylvania, Philadelphia, 
Pennsylvania. 

Price, Robert Diddams, Assistant Dean, Teachers College, University of Cincin- 
nati, Cincinnati 21, Ohio. 

Pruett, Rolla F., Director of Research, State Department of Public Instruction, 
Indianapolis 4, Indiana. 

Paste, D. Ross, Professor of Education, University of Oklahoma, Norman, Okla- 

oma. 

Quigley, Eileen E., Dean, School of Home Economics, Southern Illinois University, 
Carbondale, Illinois. 

Rabinowitz, William, Instructor in Education, Division of Teacher Education, Office 
of Research and Evaluation, 535 East 80th Street, New York 21, New York. 

bay 10 Sa Mack A., Assistant Professor of Education, Arizona State College, Tempe, 

rizona. 

Rand, E. W., Executive Dean, Jarvis Christian College, Hawkins, Texas. 

Rankin, Paul T., Assistant Superintendent of Schools, Detroit Public Schools, Detroit, 
Michigan. (President of AERA, 1933-34.) 

Rarick, G. Lawrence, Professor ‘of Physical Education, University of Wisconsin, 
Madison 6, Wisconsin. 

Rasmussen, Elmer M., Registrar, Dana College, Blair, Nebraska. 

Ratchick, Irving, Director of Pupil Personnel Services, Union Free School District 
#5, North Village Green, Levittown, Long Island, New York. 

Ravitz, Leonard, Lecturer in Education, School of Education, University of Dela- 
ware, Newark, Delaware. 

Raymond, Dorothy, Reading Consultant, Public Schools, Waterville, Maine. 

Read, John G., Professor of Education, The Science Education Center, School of 
Education, Boston University, 332 Bay State Road, Boston 15, Massachusetts. 

Reals, Willis H., Professor, Adult Education, Washington University, St. Louis 5, 
Missouri. 

Reason, Paul L., Specialist, Educational Records and Reports, U. S. Office of Edu- 
cation, Department of Health, Education, and Welfare, Washington 25, D. C. 

Redd, George N., Dean, Fisk University, Nashville 8, Tennessee. 

Reid, Jackson B., Associate Professor of Educational Psychology, Sutton Hall 311, 
The University of Texas, Austin 12, Texas. 

Rein, William C., Training Registrar, U. S. Government, 2430 E Street, N. W., Wash- 
ington 25, D. C. 

Reiner, William B., Research Associate, Bureau of Educational Program Research 
and Statistics, Board of Education of the City of New York, 110 Livingston Street, 
Brooklyn 1, New York. 

Reinhardt, Emma, Head, Department of Education and Psychology, Eastern Illinois 
University, Charleston, Illinois. 

Reitz, William, Professor of Educational Evaluation, Statistics and Research, Ex- 
aminer, College of Education, Wayne State University, Detroit 2, Michigan. 

Remmers, H. H., Director, Division of Educational Reference, Purdue University, 
Lafayette, Indiana. (President of AERA, 1954-55.) 


Remmlein, Mrs. Madaline Kinter, Assistant Director, Research Division, National 
Education Association, 1201 Sixteenth Street, N. W., Washington 6, D. C. 

Reusser, Walter C., Dean, Division of Adult Education and Community Service and 
Professor of Educational Administration, University of Wyoming, Laramie, Wyoming. 
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Rhodes, Kathleen, Associate Professor, New York State College of Home Economics, 
Cornell University, Ithaca, New York. 

Rhum, Gordon J., Associate Professor, Department of Education and Psychology, 
Iowa State Teachers College, Cedar Falls, Iowa. 

Rice, Arthur H., Editor, The Nation’s Schools, 919 North Michigan Avenue, Chicago 
11, Illinois. 

Richardson, H. D., Academic Vicepresident, Arizona State College, Tempe, Arizona. 

— Herman G., Professor of Education, University of Chicago, Chicago 37, 

inois. 

Richter, Charles O., Assistant Superintendent, Newton Public Schools, Newtonville 
60, Massachusetts. 

Ricks, James H., Jr., Assistant Director, Test Division, The Psychological Corpora- 
tion, 522 Fifth Avenue, New York 36, New York. 

Rinsland, Henry D., Professor of Education, University of Oklahoma, Norman, 
Oklahoma. 

Rivlin, Harry N., Chairman, Department of Education and Director of Graduate 
Studies, Queens College, Flushing, New York. 

Robbins, Irving, Assistant Professor of Education, Queens College, Flushing, New 
York. 

Robertson, M. S. (Retired), 1115 Camelia Avenue, Baton Rouge 6, Louisiana. 

Robinson, Glen E., Assistant Director, Research Division, National Education Asso- 
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